This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
2/5
SIFrameLowering.cpp
5/23
SILowerSGPRSpills.cpp
1/4
SIMachineFunctionInfo.h
1/2
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.h
4/7
SIRegisterInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
assert-align.ll
-
call-outgoing-stack-args.ll
-
image-waterfall-loop-O0.ll
-
localizer.ll
-
abi-attribute-hints-undefined-behavior.ll
-
amdpal-callable.ll
-
bf16.ll
-
call-alias-register-usage-agpr.ll
-
call-alias-register-usage0.ll
-
call-alias-register-usage1.ll
-
call-alias-register-usage2.ll
-
call-alias-register-usage3.ll
-
call-argument-types.ll
-
call-graph-register-usage.ll
-
call-preserved-registers.ll
-
callee-frame-setup.ll
-
cf-loop-on-constant.ll
-
collapse-endcf.ll
-
control-flow-fastregalloc.ll
-
cross-block-use-is-not-abi-copy.ll
-
dwarf-multi-register-use-crash.ll
-
extend-wwm-virt-reg-liveness.mir
-
fix-frame-reg-in-custom-csr-spills.ll
-
flat-scratch-init.ll
-
fold-reload-into-exec.mir
-
fold-reload-into-m0.mir
-
frame-setup-without-sgpr-to-vgpr-spills.ll
-
gfx-call-non-gfx-func.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
gfx-callable-return-types.ll
-
indirect-call.ll
-
insert-delay-alu-bug.ll
-
kernel-vgpr-spill-mubuf-with-voffset.ll
-
mubuf-legalize-operands-non-ptr-intrinsics.ll
-
mubuf-legalize-operands.ll
-
mul24-pass-ordering.ll
1
need-fp-from-vgpr-spills.ll
-
nested-calls.ll
-
no-source-locations-in-prologue.ll
-
partial-sgpr-to-vgpr-spills.ll
-
preserve-wwm-copy-dst-reg.ll
-
scc-clobbered-sgpr-to-vmem-spill.ll
-
sgpr-spill-dead-frame-in-dbg-value.mir
-
sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir
-
sgpr-spill-no-vgprs.ll
-
sgpr-spill-partially-undef.mir
-
sgpr-spill-update-only-slot-indexes.ll
-
sgpr-spill-vmem-large-frame.mir
-
sgpr-spills-split-regalloc.ll
-
si-spill-sgpr-stack.ll
-
sibling-call.ll
-
snippet-copy-bundle-regression.mir
-
spill-csr-frame-ptr-reg-copy.ll
-
spill-reg-tuple-super-reg-use.mir
4/6
spill-sgpr-to-virtual-vgpr.mir
-
spill-vgpr-to-agpr-update-regscavenger.ll
-
spill-writelane-vgprs.ll
-
spill192.mir
-
spill224.mir
-
spill288.mir
-
spill320.mir
-
spill352.mir
-
spill384.mir
-
stack-realign.ll
-
swdev380865.ll
-
tuple-allocation-failure.ll
-
unstructured-cfg-def-use-issue.ll
-
vgpr-spill-placement-issue61083.ll
-
vgpr-tuple-allocation.ll
-
vgpr_constant_to_sgpr.ll
-
wave32.ll
-
whole-wave-register-copy.ll
-
whole-wave-register-spill.ll
-
wwm-reserved-spill.ll
-
wwm-reserved.ll
-
MIR/AMDGPU/
-
AMDGPU/
-
stack-id-assert.mir

Differential D124196

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
AcceptedPublic

Authored by cdevadas on Apr 21 2022, 12:18 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
sebastian-ne

Commits

rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Summary

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

By spilling CSRs into virtual VGPR lanes, we might end up
with broken CFIs that can potentially corrupt the frame
unwinding in the debugger causing either a crash or a
terrible debugging experience. This occurs when regalloc
tries to spill or split the liverange of these virtual VGPRs.
The CFIs should also be inserted at these intermediate
points to correctly propagate the CFI entries. It is not
currently implemented in the compiler. As a short-term fix,
we continue to spill CSR SGPRs into physical VGPR lanes for
the debugger to correctly compute the unwind information.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

cdevadas added inline comments.Apr 26 2022, 9:22 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
399	I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is handled/printed by the generic part of AsmPrinter and it won't reach the target-specific emitInstruction at all.
425–426	Will do.
llvm/test/CodeGen/AMDGPU/csr-sgpr-spill-live-ins.mir
17–19 ↗	(On Diff #424263)	Will do.
llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
27	The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to show the tied operand. It's the simplest case.
59	This test is already hand-modified to check the tied operands.
195	I couldn't write one successfully. Will try some unstructured flow to force one.

cdevadas added inline comments.Apr 27 2022, 3:53 AM

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
195	I don't think such a case exists. A fall-through block will have only one successor and that becomes the nearest dominator for its children. It would be true even for any unstructured flow.

Fixed the review comments.
Moved UpdateLaneVGPRDomInstr lambda into a separate function.
Implemented getClearedProperties to clear certain MF properties.
Tes pre-commit + rebase.
Fixed the tied operand cases in certain tests.

Harbormaster completed remote builds in B161576: Diff 425478.Apr 27 2022, 4:33 AM

As a follow up I think we need to address the loss of being able to share VGPR lanes for unrelated spills

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
267–268	Typo "the the". It's also not necessarily unstructured
296	IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock?
llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll
233–235	This is an unfortunate regression but what I expected

arsenm added inline comments.Apr 27 2022, 2:07 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
322	As part of the follow up to allow spill slot sharing, I think we can move all of this allocation stuff out of SIMachineFunctionInfo and into SILowerSGPRSpills

cdevadas added inline comments.Apr 27 2022, 8:24 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
322	Ya, will try to move it entirely out of SIMachineFunctionInfo.

Addressed the review comments.

Harbormaster completed remote builds in B161733: Diff 425689.Apr 27 2022, 8:31 PM

cdevadas mentioned this in D124192: [AMDGPU] Callee must always spill writelane VGPRs.Jun 21 2022, 8:20 AM

Code rebase.

Herald added subscribers: kosarev, jsilvanus. · View Herald TranscriptJun 27 2022, 10:13 AM

Harbormaster completed remote builds in B172249: Diff 440294.Jun 27 2022, 10:14 AM

LGTM. Might want to introduce an asm printer flag on the implicit_def to mark it's for SGPR spills in the comment

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
270	Remove "Is there a better way to handle it?"
310	Extra ()s

This revision is now accepted and ready to land.Jun 27 2022, 5:29 PM

Should also remove the SpillSGPRToVGPR option and handling

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

In D124196#3616270, @cdevadas wrote:

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

Typo in my earlier comment. I want to do that as a separate patch.
I've identified a few more clean up that can be done while removing SpillSGPRToVGPR option.

In D124196#3616270, @cdevadas wrote:

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

Yes, that's fine

Code rebase.

Harbormaster completed remote builds in B172564: Diff 440735.Jun 28 2022, 12:59 PM

arsenm accepted this revision.Jun 28 2022, 3:39 PM

What happens when the register allocator decides to split a live range of virtual registers here, i.e. if it introduces a COPY?

cdevadas removed a parent revision: D124195: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.Jun 29 2022, 9:17 AM

In D124196#3618878, @nhaehnle wrote:

What happens when the register allocator decides to split a live range of virtual registers here, i.e. if it introduces a COPY?

This is totally broken as soon as any of these spill. We need WWM spills if they do. We should boost their priority and they need a guaranteed register to save and restore exec. I’m not sure the best way to go about this

This revision now requires changes to proceed.Jun 29 2022, 1:28 PM

Implemented WWM register spill. Reserved SGPR(s) needed for saving EXEC while manipulating the WWM spills. Included the reserved SGPRs serialization.
I couldn't reproduce the WWM COPY situation yet even after running the internal PSDB tests and hoping this patch is good to go.
Working on a follow-up patch to implement WWM Copy.

Harbormaster completed remote builds in B189651: Diff 464220.Sep 30 2022, 5:16 AM

cdevadas added a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Sep 30 2022, 5:16 AM

AFAIK, the WWM register has some unmodeled liveness behavior, which makes it impossible to allocate wwm register together with normal vector register in one pass now.
For example(a typical if-then):

bb0:
  %0 = ...
  s_cbranch_execz %bb2

bb1:
  %1 = wwm_operation
  ... = %1
  %0 = ...

bb2:
  ... = %0

VGPR %0 was dead in bb1 and WWM-VGPR %1 was defined and used in bb1. As there is no live-range conflict between them, they have a chance to get assigned the same physical register. If this happens, certain lane of %0 might be overwritten when writing to %1. I am not sure if moving the SIPreAllocateWWMRegs between the sgpr allocation and the vgpr allocation might help your case? The key point is to request the SIPreAllocateWWMRegs allocate the wwm register usage introduced in SILowerSGPRSpills.

In D124196#3829110, @ruiling wrote:
AFAIK, the WWM register has some unmodeled liveness behavior, which makes it impossible to allocate wwm register together with normal vector register in one pass now.
For example(a typical if-then):
bb0:
  %0 = ...
  s_cbranch_execz %bb2

bb1:
  %1 = wwm_operation
  ... = %1
  %0 = ...

bb2:
  ... = %0
VGPR %0 was dead in bb1 and WWM-VGPR %1 was defined and used in bb1. As there is no live-range conflict between them, they have a chance to get assigned the same physical register. If this happens, certain lane of %0 might be overwritten when writing to %1. I am not sure if moving the SIPreAllocateWWMRegs between the sgpr allocation and the vgpr allocation might help your case? The key point is to request the SIPreAllocateWWMRegs allocate the wwm register usage introduced in SILowerSGPRSpills.

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

IIRC, SIPreAllocateWWMRegs can help allocate only when we have enough free VGPRs. There is no live-range spill/split incorporated in this custom pass. It won’t help in the case of large functions with more SGPR spills.
The best approach would be to introduce another regalloc pipeline between the existing SGPR and VGPR allocations. The new pipeline should allocate only the WWM-registers.
It would, however, increase the compile time complexity further. But I’m not sure we have a better choice.

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

I think we need to make sure the idea is correct in all possible cases we can think of. The writelane/readlane shares the same behavior with WWM operation regarding to the issue here. That is: they may write to a VGPR lane that the corresponding thread is inactive. "spanning across different blocks" won't help on the problem. Even the writelane/readlane operations span across more than one thousand blocks, it can still be nested in an outer if-then structure.

cdevadas removed a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Oct 3 2022, 9:37 PM

In D124196#3829974, @ruiling wrote:

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

I think we need to make sure the idea is correct in all possible cases we can think of. The writelane/readlane shares the same behavior with WWM operation regarding to the issue here. That is: they may write to a VGPR lane that the corresponding thread is inactive. "spanning across different blocks" won't help on the problem. Even the writelane/readlane operations span across more than one thousand blocks, it can still be nested in an outer if-then structure.

Yes, we should fix this case. And we don't see a better way other than introducing a new regalloc pipeline for wwm registers alone. The effort for that is yet to be accounted and planning a follow-up patch to split the vgpr allocation.

Moved VRegFlags into AMDGPU files. Introduced the MRI delegate callbacks and used the delegate method to propagate the virtual register flags.

Herald added a subscriber: arphaman. · View Herald TranscriptOct 25 2022, 7:29 AM

Harbormaster completed remote builds in B194175: Diff 470483.Oct 25 2022, 7:30 AM

Simplified addDelegate function to reflect the recent changes made in D134950.

Harbormaster completed remote builds in B194341: Diff 470714.Oct 25 2022, 10:52 PM

Pierre-vh added a subscriber: Pierre-vh.Oct 26 2022, 1:16 AM

Pierre-vh added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2068	Why does SCC need to be dead? What happens if another instruction right after uses it?

cdevadas added inline comments.Oct 26 2022, 1:43 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2068	The code here is only to manipulate exec mask and no other instruction depends on the SCC that it produces, and we should mark it dead to avoid unwanted side effects. We don't have an alternate instruction that doesn't clobber SCC.

Pierre-vh added inline comments.Oct 26 2022, 1:50 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

2068

Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is alive?
I was trying out this patch and I have a case where it's causing issues:

S_CMP_EQ_U32 killed renamable $sgpr6, killed renamable $sgpr7, implicit-def $scc
renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr4, 4, $vgpr0(tied-def 0), implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr5, 5, $vgpr0(tied-def 0), implicit killed $sgpr4_sgpr5
$sgpr10_sgpr11 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
$agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
$exec = S_MOV_B64 killed $sgpr10_sgpr11
S_CBRANCH_SCC1 %bb.5, implicit killed $scc

Insertion is between the S_CMP and the S_CBRANCH.

cdevadas added inline comments.Oct 26 2022, 1:59 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2068	Yes, the check is already in place. See the code above, the if condition, that inserts two separate move instructions when SCC is live and the else part uses SCC when it is free. Not sure why RegScavenger returned false. It should have returned SCC as clobbered.

cdevadas added inline comments.Oct 26 2022, 2:10 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2068	See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS returned the correct liveness info for SCC.

Rebase after recent changes in D134950.

Harbormaster completed remote builds in B194472: Diff 470903.Oct 26 2022, 12:43 PM

cdevadas added a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Oct 27 2022, 11:39 PM

cdevadas mentioned this in D124195: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.Oct 28 2022, 11:34 AM

Code rebase.

Harbormaster completed remote builds in B195465: Diff 472291.Nov 1 2022, 7:10 AM

Rebase

Harbormaster completed remote builds in B195608: Diff 472479.Nov 1 2022, 7:00 PM

Pierre-vh added inline comments.Nov 2 2022, 3:37 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1358	I think this is missing and it's what's causing verification errors with "Using an undefined physical register" that I was talking about. The current code just tells the scavenger to enter that block but it doesn't update it to the right instruction, so eliminateFrameIndex is working with information from the start of the BB, not from the MI it's dealing with

cdevadas added inline comments.Nov 2 2022, 4:05 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1358	An entirely different problem and needs to be implemented separately. The code that handles the register liveness update is implemented in `PEI::replaceFrameIndices` and it tracks the loops and invokes RS->forward() appropriately to update the liveness info. I guess we should bring this code into VGPR to AGPR spill path.

Included the patch provided by @Pierre-vh to correctly update the register liveness in the RegisterScavenger during VGPR -> AGPR spilling.
This patch avoids a crash that occurred when enabled SGPR spill to virtual VGPR lanes.

diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1511,52 +1511,52 @@ void SIFrameLowering::processFunctionBeforeFrameFinalized(

                     && EnableSpillVGPRToAGPR;
                     
if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI,
                                      TRI->isAGPR(MRI, VReg))) {

// FIXME: change to enterBasicBlockEnd()
RS->enterBasicBlock(MBB);

+ RS->enterBasicBlockEnd(MBB);
+ RS->backward(MI);

TRI->eliminateFrameIndex(MI, 0, FIOp, RS);
SpillFIs.set(FI);
continue;

Included the new test llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll.

Harbormaster completed remote builds in B195744: Diff 472666.Nov 2 2022, 10:19 AM

Ping

arsenm added inline comments.Nov 14 2022, 1:32 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1356–1358	D137574 is in flight to invert the direction, should we land that first / separately?
llvm/lib/Target/AMDGPU/SIInstrInfo.h
628 ↗	(On Diff #472666)	static?
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
60	Is this introducing a new computation in the pass pipeline (I assume not since I don't see a pass pipeline test update)
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
665	Reg.isVirtual()
671	Reg.isVirtual()

arsenm added inline comments.Nov 14 2022, 1:32 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
478–483	I don't like having state here for a single operation that's happening in one pass and isn't valid for multiple uses. I don't really understand how this is being set and passed around
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
652	Isn't this always required?

cdevadas added inline comments.Nov 15 2022, 10:30 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1356–1358	Alex's patch has landed. But this code is still needed to update the liveness for each instruction as eliminateFrameIndex is called here.
llvm/lib/Target/AMDGPU/SIInstrInfo.h
628 ↗	(On Diff #472666)	Will change.
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
60	It isn't.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
478–483	CurrentVRegSpilled is needed to track the virtual register (Liverange) for which the physical register was assigned. And it is needed only for fast regalloc . We need this mapping to correctly track the WWM spills as RegAllocFast spills/restore the physical registers directly as there is no VRM. This will be appropriately set with the delegate MRI_NoteVirtualRegisterSpill which is inserted in the RegAllocFast spill/reload functions. SIMachineFunctionInfo is where the delegates are currently handled and I don't have a better place to move it.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
652	No. They are reserved only if RA inserts any whole wave spill.

Rebase + Suggestions incorporated.

Harbormaster completed remote builds in B197797: Diff 475518.Nov 15 2022, 10:45 AM

Ping

cdevadas removed a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Nov 22 2022, 12:13 PM

Rebase + Incorporated changes after D138515 to move the handling of physReg to current VirtReg mapping entirely into the generic design.

Harbormaster completed remote builds in B199233: Diff 477532.Nov 23 2022, 9:26 AM

cdevadas added a parent revision: D138517: [CodeGen] Use cloneVirtualRegister in LiveIntervals and LiveRangeEdit.Nov 23 2022, 9:26 AM

cdevadas mentioned this in D138515: [CodeGen][RegAllocFast] Map PhysReg to its current VirtReg.Nov 23 2022, 9:50 AM

Implemented the WWM spill during RegAllocFast using the additional argument to the spiller interface introduced with patch D138656.

Harbormaster completed remote builds in B199400: Diff 477752.Nov 24 2022, 5:03 AM

cdevadas mentioned this in D138656: [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot.Nov 24 2022, 5:50 AM

cdevadas removed a parent revision: D138517: [CodeGen] Use cloneVirtualRegister in LiveIntervals and LiveRangeEdit.Nov 24 2022, 5:53 AM

cdevadas added a parent revision: D138656: [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot.

rebase

Harbormaster completed remote builds in B203386: Diff 483233.Dec 15 2022, 10:38 AM

arsenm accepted this revision.Dec 15 2022, 10:45 AM

This revision is now accepted and ready to land.Dec 15 2022, 10:45 AM

This revision was landed with ongoing or failed builds.Dec 16 2022, 10:27 PM

Closed by commit rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs (authored by cdevadas). · Explain Why

This revision was automatically updated to reflect the committed changes.

cdevadas added a commit: rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

This patch causes OpenMC offloaded via OpenMP on AMDGPUs to crash at runtime. It looks like some corruption in the memory address.
You can find build instructions here: https://github.com/jtramm/openmc_offloading_builder

The commit before this one works fine though, assuming you cherry picked https://reviews.llvm.org/rGee1d000d43321590771a2f047c8c55d07d09ad28 first as it landed after.
I assume other codes will be impacted too.

@jtramm @ronlieb @jhuber6 FYI

This revision is now accepted and ready to land.Dec 19 2022, 11:14 PM

In D124196#4007017, @jdoerfert wrote:

This patch causes OpenMC offloaded via OpenMP on AMDGPUs to crash at runtime. It looks like some corruption in the memory address.
You can find build instructions here: https://github.com/jtramm/openmc_offloading_builder

The commit before this one works fine though, assuming you cherry picked https://reviews.llvm.org/rGee1d000d43321590771a2f047c8c55d07d09ad28 first as it landed after.
I assume other codes will be impacted too.

@jtramm @ronlieb @jhuber6 FYI

Thanks. Going to take a look.

cdevadas added a reverting change: rGa3028239a751: Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs".Dec 21 2022, 2:50 AM

Rebased after whole-wave copy implementation.

cdevadas added a parent revision: D143762: [AMDGPU] Enable whole wave register copy.Feb 10 2023, 10:00 AM

cdevadas mentioned this in D143754: [MachineInstr] Introduce generic predicated copy opcode.Feb 10 2023, 10:05 AM

cdevadas removed a parent revision: D143762: [AMDGPU] Enable whole wave register copy.May 8 2023, 4:36 AM

Rebased
Incorporated the downstream code

Harbormaster completed remote builds in B232828: Diff 523333.May 18 2023, 4:09 AM

yassingh added a parent revision: D143762: [AMDGPU] Enable whole wave register copy.May 18 2023, 4:12 AM

cdevadas edited the summary of this revision. (Show Details)May 18 2023, 5:06 AM

rebase

Harbormaster completed remote builds in B236912: Diff 528813.Jun 6 2023, 5:44 AM

rebase

Harbormaster completed remote builds in B239970: Diff 532865.Jun 20 2023, 4:23 AM

arsenm added inline comments.Jun 21 2023, 5:27 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	It shouldn't have been SSA to begin with ad this doesn't de-SSA
67	Add a comment explaining the new vregs?
399	You don't need to specially handle the instruction, see AsmPrinterFlags

Just a few more nits

This revision now requires changes to proceed.Jun 22 2023, 10:55 AM

yassingh added inline comments.Jun 26 2023, 4:53 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ?
399	Tried adding a new flag here D153754

Review comments

Harbormaster completed remote builds in B241203: Diff 534590.Jun 26 2023, 8:59 AM

yassingh added inline comments.Jun 26 2023, 9:07 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Removing this line works fine when running the whole pipeline as the compiler knows the code here is not in SSA form. However, when SILowerSGPRSpills and related passes are run in isolation the verifier assumes the code to be in SSA form(possibly a bug there, also we are introducing virtual vgprs maybe that's the reason). I can leave the line as it is or is there some way to update the test files to let the compiler know the input isn't SSA? I tried "isSSA: false", didn't work.

cdevadas added inline comments.Jun 26 2023, 9:41 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Seems reasonable to retain this line for now. The compiler might not be able to decide that this pass is run post phi-elimination and assume SSA form by default. There must be a serialized option to control it for MIR tests.

yassingh added inline comments.Jun 26 2023, 9:23 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't expose a way to override it.

Rebase over ancestor patch changes.

Harbormaster completed remote builds in B241700: Diff 535257.Jun 28 2023, 12:09 AM

arsenm accepted this revision.Jun 28 2023, 9:25 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1356–1357	This is a pre-existing issue that should be fixed, but we should not be scanning the entire block from the end on every spill. The block iteration should be reversed and we should lazily call enterBasicBlockEnd on the first seen spill
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	this is kind of a mir parser bug

This revision is now accepted and ready to land.Jun 28 2023, 9:25 AM

cdevadas mentioned this in D143762: [AMDGPU] Enable whole wave register copy.Jul 4 2023, 6:57 AM

fix comment

Harbormaster completed remote builds in B243664: Diff 537989.Jul 6 2023, 11:53 PM

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:46 AM

Closed by commit rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs (authored by cdevadas, committed by yassingh). · Explain Why

This revision was automatically updated to reflect the committed changes.

yassingh added a commit: rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

Harbormaster completed remote builds in B243813: Diff 538200.Jul 7 2023, 12:52 PM

cdevadas mentioned this in D150388: [CodeGen]Allow targets to use target specific COPY instructions for live range splitting.Jul 16 2023, 12:01 PM

Still breaks OpenMC... https://github.com/llvm/llvm-project/issues/63983

This revision is now accepted and ready to land.Jul 20 2023, 9:29 AM

vitalybuka added a reverting change: D156381: Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting".Jul 26 2023, 4:00 PM

vitalybuka added a reverting change: rGa496c8be6e63: Revert "[CodeGen]Allow targets to use target specific COPY instructions for….Jul 26 2023, 10:13 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFrameLowering.cpp

16 lines

SILowerSGPRSpills.cpp

136 lines

SIMachineFunctionInfo.h

37 lines

SIMachineFunctionInfo.cpp

69 lines

SIRegisterInfo.h

17 lines

SIRegisterInfo.cpp

23 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

assert-align.ll

6 lines

call-outgoing-stack-args.ll

24 lines

image-waterfall-loop-O0.ll

196 lines

localizer.ll

6 lines

abi-attribute-hints-undefined-behavior.ll

6 lines

amdpal-callable.ll

16 lines

bf16.ll

120 lines

call-alias-register-usage-agpr.ll

10 lines

call-alias-register-usage0.ll

2 lines

call-alias-register-usage1.ll

4 lines

call-alias-register-usage2.ll

2 lines

call-alias-register-usage3.ll

2 lines

call-argument-types.ll

148 lines

call-graph-register-usage.ll

16 lines

call-preserved-registers.ll

16 lines

callee-frame-setup.ll

26 lines

cf-loop-on-constant.ll

121 lines

collapse-endcf.ll

833 lines

control-flow-fastregalloc.ll

12 lines

cross-block-use-is-not-abi-copy.ll

24 lines

dwarf-multi-register-use-crash.ll

6 lines

extend-wwm-virt-reg-liveness.mir

279 lines

fix-frame-reg-in-custom-csr-spills.ll

6 lines

flat-scratch-init.ll

74 lines

fold-reload-into-exec.mir

46 lines

fold-reload-into-m0.mir

14 lines

frame-setup-without-sgpr-to-vgpr-spills.ll

6 lines

gfx-call-non-gfx-func.ll

8 lines

gfx-callable-argument-types.ll

4153 lines

gfx-callable-preserved-registers.ll

396 lines

gfx-callable-return-types.ll

112 lines

indirect-call.ll

72 lines

insert-delay-alu-bug.ll

73 lines

kernel-vgpr-spill-mubuf-with-voffset.ll

44 lines

mubuf-legalize-operands-non-ptr-intrinsics.ll

590 lines

mubuf-legalize-operands.ll

677 lines

mul24-pass-ordering.ll

6 lines

need-fp-from-vgpr-spills.ll

47 lines

nested-calls.ll

6 lines

no-source-locations-in-prologue.ll

8 lines

partial-sgpr-to-vgpr-spills.ll

1233 lines

preserve-wwm-copy-dst-reg.ll

816 lines

scc-clobbered-sgpr-to-vmem-spill.ll

391 lines

sgpr-spill-dead-frame-in-dbg-value.mir

37 lines

sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir

4 lines

sgpr-spill-no-vgprs.ll

300 lines

sgpr-spill-partially-undef.mir

14 lines

sgpr-spill-update-only-slot-indexes.ll

24 lines

sgpr-spill-vmem-large-frame.mir

4 lines

sgpr-spills-split-regalloc.ll

76 lines

si-spill-sgpr-stack.ll

12 lines

sibling-call.ll

10 lines

snippet-copy-bundle-regression.mir

42 lines

spill-csr-frame-ptr-reg-copy.ll

14 lines

spill-reg-tuple-super-reg-use.mir

38 lines

spill-sgpr-to-virtual-vgpr.mir

319 lines

spill-vgpr-to-agpr-update-regscavenger.ll

92 lines

spill-writelane-vgprs.ll

8 lines

29 lines

33 lines

41 lines

45 lines

49 lines

53 lines

10 lines

92 lines

tuple-allocation-failure.ll

91 lines

unstructured-cfg-def-use-issue.ll

12 lines

vgpr-spill-placement-issue61083.ll

34 lines

vgpr-tuple-allocation.ll

70 lines

vgpr_constant_to_sgpr.ll

14 lines

wave32.ll

22 lines

whole-wave-register-copy.ll

68 lines

whole-wave-register-spill.ll

140 lines

wwm-reserved-spill.ll

343 lines

wwm-reserved.ll

8 lines

MIR/

AMDGPU/

stack-id-assert.mir

2 lines

Diff 538203

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines if (!ScratchSGPR) {

if (TRI->spillSGPRToVGPR() && if (TRI->spillSGPRToVGPR() &&

MFI->allocateSGPRSpillToVGPRLane(MF, FI, /* IsPrologEpilog */ true)) { MFI->allocateSGPRSpillToVGPRLane(MF, FI, /* IsPrologEpilog */ true)) {

// 2: There's no free lane to spill, and no free register to save the // 2: There's no free lane to spill, and no free register to save the

// SGPR, so we're forced to take another VGPR to use for the spill. // SGPR, so we're forced to take another VGPR to use for the spill.

MFI->addToPrologEpilogSGPRSpills( MFI->addToPrologEpilogSGPRSpills(

SGPR, PrologEpilogSGPRSaveRestoreInfo( SGPR, PrologEpilogSGPRSaveRestoreInfo(

SGPRSaveKind::SPILL_TO_VGPR_LANE, FI)); SGPRSaveKind::SPILL_TO_VGPR_LANE, FI));

LLVM_DEBUG( LLVM_DEBUG(auto Spill = MFI->getSGPRSpillToPhysicalVGPRLanes(FI).front();

auto Spill = MFI->getPrologEpilogSGPRSpillToVGPRLanes(FI).front();

dbgs() << printReg(SGPR, TRI) << " requires fallback spill to " dbgs() << printReg(SGPR, TRI) << " requires fallback spill to "

<< printReg(Spill.VGPR, TRI) << ':' << Spill.Lane << '\n';); << printReg(Spill.VGPR, TRI) << ':' << Spill.Lane

<< '\n';);

} else { } else {

// Remove dead <FI> index // Remove dead <FI> index

MF.getFrameInfo().RemoveStackObject(FI); MF.getFrameInfo().RemoveStackObject(FI);

// 3: If all else fails, spill the register to memory. // 3: If all else fails, spill the register to memory.

FI = FrameInfo.CreateSpillStackObject(Size, Alignment); FI = FrameInfo.CreateSpillStackObject(Size, Alignment);

MFI->addToPrologEpilogSGPRSpills( MFI->addToPrologEpilogSGPRSpills(

SGPR, SGPR,

PrologEpilogSGPRSaveRestoreInfo(SGPRSaveKind::SPILL_TO_MEM, FI)); PrologEpilogSGPRSaveRestoreInfo(SGPRSaveKind::SPILL_TO_MEM, FI));

▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines void saveToMemory(const int FI) const {

} }

void saveToVGPRLane(const int FI) const { void saveToVGPRLane(const int FI) const {

assert(!MFI.isDeadObjectIndex(FI)); assert(!MFI.isDeadObjectIndex(FI));

assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill); assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);

ArrayRef<SIRegisterInfo::SpilledReg> Spill = ArrayRef<SIRegisterInfo::SpilledReg> Spill =

FuncInfo->getPrologEpilogSGPRSpillToVGPRLanes(FI); FuncInfo->getSGPRSpillToPhysicalVGPRLanes(FI);

assert(Spill.size() == NumSubRegs); assert(Spill.size() == NumSubRegs);

for (unsigned I = 0; I < NumSubRegs; ++I) { for (unsigned I = 0; I < NumSubRegs; ++I) {

? SuperReg ? SuperReg

: Register(TRI.getSubReg(SuperReg, SplitParts[I])); : Register(TRI.getSubReg(SuperReg, SplitParts[I]));

BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_WRITELANE_B32), Spill[I].VGPR) BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_WRITELANE_B32), Spill[I].VGPR)

.addReg(SubReg) .addReg(SubReg)

Show All 28 Lines for (unsigned I = 0, DwordOff = 0; I < NumSubRegs; ++I) {

.addReg(TmpVGPR, RegState::Kill); .addReg(TmpVGPR, RegState::Kill);

DwordOff += 4; DwordOff += 4;

} }

void restoreFromVGPRLane(const int FI) { void restoreFromVGPRLane(const int FI) {

assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill); assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);

ArrayRef<SIRegisterInfo::SpilledReg> Spill = ArrayRef<SIRegisterInfo::SpilledReg> Spill =

FuncInfo->getPrologEpilogSGPRSpillToVGPRLanes(FI); FuncInfo->getSGPRSpillToPhysicalVGPRLanes(FI);

assert(Spill.size() == NumSubRegs); assert(Spill.size() == NumSubRegs);

for (unsigned I = 0; I < NumSubRegs; ++I) { for (unsigned I = 0; I < NumSubRegs; ++I) {

? SuperReg ? SuperReg

: Register(TRI.getSubReg(SuperReg, SplitParts[I])); : Register(TRI.getSubReg(SuperReg, SplitParts[I]));

BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_READLANE_B32), SubReg) BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_READLANE_B32), SubReg)

.addReg(Spill[I].VGPR) .addReg(Spill[I].VGPR)

▲ Show 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines for (MachineBasicBlock &MBB : MF) {

unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(), unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(),

AMDGPU::OpName::vaddr); AMDGPU::OpName::vaddr);

int FI = MI.getOperand(FIOp).getIndex(); int FI = MI.getOperand(FIOp).getIndex();

TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg(); TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();

if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI, if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI,

TRI->isAGPR(MRI, VReg))) { TRI->isAGPR(MRI, VReg))) {

assert(RS != nullptr); assert(RS != nullptr);

// FIXME: change to enterBasicBlockEnd() RS->enterBasicBlockEnd(MBB);

RS->enterBasicBlock(MBB); RS->backward(MI);

arsenmUnsubmitted

Not Done

This is a pre-existing issue that should be fixed, but we should not be scanning the entire block from the end on every spill. The block iteration should be reversed and we should lazily call enterBasicBlockEnd on the first seen spill

arsenm: This is a pre-existing issue that should be fixed, but we should not be scanning the entire…

TRI->eliminateFrameIndex(MI, 0, FIOp, RS); TRI->eliminateFrameIndex(MI, 0, FIOp, RS);

Pierre-vhUnsubmitted

Not Done

RS->enterBasicBlock(MBB);

+ RS->forward(MI);

TRI->eliminateFrameIndex(MI, 0, FIOp, RS);

SpillFIs.set(FI);

I think this is missing and it's what's causing verification errors with "Using an undefined physical register" that I was talking about.
The current code just tells the scavenger to enter that block but it doesn't update it to the right instruction, so eliminateFrameIndex is working with information from the start of the BB, not from the MI it's dealing with

Pierre-vh: I think this is missing and it's what's causing verification errors with "Using an undefined…

cdevadasAuthorUnsubmitted

Done

An entirely different problem and needs to be implemented separately. The code that handles the register liveness update is implemented in PEI::replaceFrameIndices and it tracks the loops and invokes RS->forward() appropriately to update the liveness info. I guess we should bring this code into VGPR to AGPR spill path.

cdevadas: An entirely different problem and needs to be implemented separately. The code that handles the…

arsenmUnsubmitted

Not Done

D137574 is in flight to invert the direction, should we land that first / separately?

arsenm: D137574 is in flight to invert the direction, should we land that first / separately?

cdevadasAuthorUnsubmitted

Done

Alex's patch has landed. But this code is still needed to update the liveness for each instruction as eliminateFrameIndex is called here.

cdevadas: Alex's patch has landed. But this code is still needed to update the liveness for each…

SpillFIs.set(FI); SpillFIs.set(FI);

continue; continue;

} }

} else if (TII->isStoreToStackSlot(MI, FrameIndex) || } else if (TII->isStoreToStackSlot(MI, FrameIndex) ||

TII->isLoadFromStackSlot(MI, FrameIndex)) TII->isLoadFromStackSlot(MI, FrameIndex))

if (!MFI.isFixedObjectIndex(FrameIndex)) if (!MFI.isFixedObjectIndex(FrameIndex))

NonVGPRSpillFIs.set(FrameIndex); NonVGPRSpillFIs.set(FrameIndex);

} }

▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	private:
MBBVector RestoreBlocks;		MBBVector RestoreBlocks;

public:		public:
static char ID;		static char ID;

SILowerSGPRSpills() : MachineFunctionPass(ID) {}		SILowerSGPRSpills() : MachineFunctionPass(ID) {}

void calculateSaveRestoreBlocks(MachineFunction &MF);		void calculateSaveRestoreBlocks(MachineFunction &MF);
bool spillCalleeSavedRegs(MachineFunction &MF);		bool spillCalleeSavedRegs(MachineFunction &MF,
		SmallVectorImpl<int> &CalleeSavedFIs);
		void extendWWMVirtRegLiveness(MachineFunction &MF, LiveIntervals *LIS);

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesAll();		AU.setPreservesAll();
		arsenmUnsubmitted Not Done Reply Inline Actions Is this introducing a new computation in the pass pipeline (I assume not since I don't see a pass pipeline test update) arsenm: Is this introducing a new computation in the pass pipeline (I assume not since I don't see a…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions It isn't. cdevadas: It isn't.
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

		MachineFunctionProperties getClearedProperties() const override {
		// SILowerSGPRSpills introduces new Virtual VGPRs for spilling SGPRs.
		return MachineFunctionProperties()
		arsenmUnsubmitted Not Done Reply Inline Actions It shouldn't have been SSA to begin with ad this doesn't de-SSA arsenm: It shouldn't have been SSA to begin with ad this doesn't de-SSA
		yassinghUnsubmitted Not Done Reply Inline Actions Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ? yassingh: Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ?
		yassinghUnsubmitted Not Done Reply Inline Actions Removing this line works fine when running the whole pipeline as the compiler knows the code here is not in SSA form. However, when SILowerSGPRSpills and related passes are run in isolation the verifier assumes the code to be in SSA form(possibly a bug there, also we are introducing virtual vgprs maybe that's the reason). I can leave the line as it is or is there some way to update the test files to let the compiler know the input isn't SSA? I tried "isSSA: false", didn't work. yassingh: Removing this line works fine when running the whole pipeline as the compiler knows the code…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Seems reasonable to retain this line for now. The compiler might not be able to decide that this pass is run post phi-elimination and assume SSA form by default. There must be a serialized option to control it for MIR tests. cdevadas: Seems reasonable to retain this line for now. The compiler might not be able to decide that…
		yassinghUnsubmitted Not Done Reply Inline Actions Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't expose a way to override it. yassingh: Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't…
		arsenmUnsubmitted Not Done Reply Inline Actions this is kind of a mir parser bug arsenm: this is kind of a mir parser bug
		.set(MachineFunctionProperties::Property::IsSSA)
		arsenmUnsubmitted Not Done Reply Inline Actions Add a comment explaining the new vregs? arsenm: Add a comment explaining the new vregs?
		.set(MachineFunctionProperties::Property::NoVRegs);
		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char SILowerSGPRSpills::ID = 0;		char SILowerSGPRSpills::ID = 0;

INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,
"SI lower SGPR spill instructions", false, false)		"SI lower SGPR spill instructions", false, false)
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
static void updateLiveness(MachineFunction &MF, ArrayRef<CalleeSavedInfo> CSI) {		static void updateLiveness(MachineFunction &MF, ArrayRef<CalleeSavedInfo> CSI) {
MachineBasicBlock &EntryBB = MF.front();		MachineBasicBlock &EntryBB = MF.front();

for (const CalleeSavedInfo &CSIReg : CSI)		for (const CalleeSavedInfo &CSIReg : CSI)
EntryBB.addLiveIn(CSIReg.getReg());		EntryBB.addLiveIn(CSIReg.getReg());
EntryBB.sortUniqueLiveIns();		EntryBB.sortUniqueLiveIns();
}		}

bool SILowerSGPRSpills::spillCalleeSavedRegs(MachineFunction &MF) {		bool SILowerSGPRSpills::spillCalleeSavedRegs(
		MachineFunction &MF, SmallVectorImpl<int> &CalleeSavedFIs) {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIFrameLowering *TFI = ST.getFrameLowering();		const SIFrameLowering *TFI = ST.getFrameLowering();
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
RegScavenger *RS = nullptr;		RegScavenger *RS = nullptr;

// Determine which of the registers in the callee save list should be saved.		// Determine which of the registers in the callee save list should be saved.
Show All 14 Lines	for (unsigned I = 0; CSRegs[I]; ++I) {

if (SavedRegs.test(Reg)) {		if (SavedRegs.test(Reg)) {
const TargetRegisterClass *RC =		const TargetRegisterClass *RC =
TRI->getMinimalPhysRegClass(Reg, MVT::i32);		TRI->getMinimalPhysRegClass(Reg, MVT::i32);
int JunkFI = MFI.CreateStackObject(TRI->getSpillSize(*RC),		int JunkFI = MFI.CreateStackObject(TRI->getSpillSize(*RC),
TRI->getSpillAlign(*RC), true);		TRI->getSpillAlign(*RC), true);

CSI.push_back(CalleeSavedInfo(Reg, JunkFI));		CSI.push_back(CalleeSavedInfo(Reg, JunkFI));
		CalleeSavedFIs.push_back(JunkFI);
}		}
}		}

if (!CSI.empty()) {		if (!CSI.empty()) {
for (MachineBasicBlock *SaveBlock : SaveBlocks)		for (MachineBasicBlock *SaveBlock : SaveBlocks)
insertCSRSaves(*SaveBlock, CSI, Indexes, LIS);		insertCSRSaves(*SaveBlock, CSI, Indexes, LIS);

// Add live ins to save blocks.		// Add live ins to save blocks.
assert(SaveBlocks.size() == 1 && "shrink wrapping not fully implemented");		assert(SaveBlocks.size() == 1 && "shrink wrapping not fully implemented");
updateLiveness(MF, CSI);		updateLiveness(MF, CSI);

for (MachineBasicBlock *RestoreBlock : RestoreBlocks)		for (MachineBasicBlock *RestoreBlock : RestoreBlocks)
insertCSRRestores(*RestoreBlock, CSI, Indexes, LIS);		insertCSRRestores(*RestoreBlock, CSI, Indexes, LIS);
return true;		return true;
}		}
}		}

return false;		return false;
}		}

		void SILowerSGPRSpills::extendWWMVirtRegLiveness(MachineFunction &MF,
		LiveIntervals *LIS) {
		// TODO: This is a workaround to avoid the unmodelled liveness computed with
		// whole-wave virtual registers when allocated together with the regular VGPR
		// virtual registers. Presently, the liveness computed during the regalloc is
		// only uniform (or single lane aware) and it doesn't take account of the
		// divergent control flow that exists for our GPUs. Since the WWM registers
		arsenmUnsubmitted Not Done Reply Inline Actions Typo "the the". It's also not necessarily unstructured arsenm: Typo "the the". It's also not necessarily unstructured
		// can modify inactive lanes, the wave-aware liveness should be computed for
		// the virtual registers to accurately plot their interferences. Without
		arsenmUnsubmitted Not Done Reply Inline Actions Remove "Is there a better way to handle it?" arsenm: Remove "Is there a better way to handle it?"
		// having the divergent CFG for the function, it is difficult to implement the
		// wave-aware liveness info. Until then, we conservatively extend the liveness
		// of the wwm registers into the entire function so that they won't be reused
		// without first spilling/splitting their liveranges.
		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

		// Insert the IMPLICIT_DEF for the wwm-registers in the entry blocks.
		for (auto Reg : MFI->getSGPRSpillVGPRs()) {
		for (MachineBasicBlock *SaveBlock : SaveBlocks) {
		MachineBasicBlock::iterator InsertBefore = SaveBlock->begin();
		auto MIB = BuildMI(SaveBlock, InsertBefore, InsertBefore->getDebugLoc(),
		TII->get(AMDGPU::IMPLICIT_DEF), Reg);
		MFI->setFlag(Reg, AMDGPU::VirtRegFlag::WWM_REG);
		if (LIS) {
		LIS->InsertMachineInstrInMaps(*MIB);
		}
		}
		}

		// Insert the KILL in the return blocks to extend their liveness untill the
		// end of function. Insert a separate KILL for each VGPR.
		for (MachineBasicBlock *RestoreBlock : RestoreBlocks) {
		MachineBasicBlock::iterator InsertBefore =
		RestoreBlock->getFirstTerminator();
		for (auto Reg : MFI->getSGPRSpillVGPRs()) {
		auto MIB =
		arsenmUnsubmitted Not Done Reply Inline Actions IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock? arsenm: IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock?
		BuildMI(RestoreBlock, InsertBefore, InsertBefore->getDebugLoc(),
		TII->get(TargetOpcode::KILL));
		MIB.addReg(Reg);
		if (LIS)
		LIS->InsertMachineInstrInMaps(*MIB);
		}
		}
		}

bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {		bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

		arsenmUnsubmitted Not Done Reply Inline Actions Extra ()s arsenm: Extra ()s
LIS = getAnalysisIfAvailable<LiveIntervals>();		LIS = getAnalysisIfAvailable<LiveIntervals>();
Indexes = getAnalysisIfAvailable<SlotIndexes>();		Indexes = getAnalysisIfAvailable<SlotIndexes>();

assert(SaveBlocks.empty() && RestoreBlocks.empty());		assert(SaveBlocks.empty() && RestoreBlocks.empty());

// First, expose any CSR SGPR spills. This is mostly the same as what PEI		// First, expose any CSR SGPR spills. This is mostly the same as what PEI
// does, but somewhat simpler.		// does, but somewhat simpler.
calculateSaveRestoreBlocks(MF);		calculateSaveRestoreBlocks(MF);
bool HasCSRs = spillCalleeSavedRegs(MF);		SmallVector<int> CalleeSavedFIs;
		bool HasCSRs = spillCalleeSavedRegs(MF, CalleeSavedFIs);

MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();

if (!MFI.hasStackObjects() && !HasCSRs) {		if (!MFI.hasStackObjects() && !HasCSRs) {
SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();
return false;		return false;
}		}

bool MadeChange = false;		bool MadeChange = false;
bool NewReservedRegs = false;		bool NewReservedRegs = false;
		bool SpilledToVirtVGPRLanes = false;

// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be		// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be
// handled as SpilledToReg in regular PrologEpilogInserter.		// handled as SpilledToReg in regular PrologEpilogInserter.
const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&		const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&
(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());		(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());
if (HasSGPRSpillToVGPR) {		if (HasSGPRSpillToVGPR) {
// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs		// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs
// are spilled to VGPRs, in which case we can eliminate the stack usage.		// are spilled to VGPRs, in which case we can eliminate the stack usage.
//		//
// This operates under the assumption that only other SGPR spills are users		// This operates under the assumption that only other SGPR spills are users
// of the frame index.		// of the frame index.

// To track the spill frame indices handled in this pass.		// To track the spill frame indices handled in this pass.
BitVector SpillFIs(MFI.getObjectIndexEnd(), false);		BitVector SpillFIs(MFI.getObjectIndexEnd(), false);

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {		for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
		arsenmUnsubmitted Not Done Reply Inline Actions Seems worthwhile for this to be its own real function instead of a lambda arsenm: Seems worthwhile for this to be its own real function instead of a lambda
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes indeed. Will do. cdevadas: Yes indeed. Will do.
if (!TII->isSGPRSpill(MI))		if (!TII->isSGPRSpill(MI))
continue;		continue;

int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();		int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();
assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
if (FuncInfo->allocateSGPRSpillToVGPRLane(MF, FI)) {
		bool IsCalleeSaveSGPRSpill =
		std::find(CalleeSavedFIs.begin(), CalleeSavedFIs.end(), FI) !=
		CalleeSavedFIs.end();
		if (IsCalleeSaveSGPRSpill) {
		// Spill callee-saved SGPRs into physical VGPR lanes.

		// TODO: This is to ensure the CFIs are static for efficient frame
		// unwinding in the debugger. Spilling them into virtual VGPR lanes
		// involve regalloc to allocate the physical VGPRs and that might
		// cause intermediate spill/split of such liveranges for successful
		// allocation. This would result in broken CFI encoding unless the
		// regalloc aware CFI generation to insert new CFIs along with the
		// intermediate spills is implemented. There is no such support
		// currently exist in the LLVM compiler.
		if (FuncInfo->allocateSGPRSpillToVGPRLane(MF, FI, true)) {
NewReservedRegs = true;		NewReservedRegs = true;
bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(		bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(
		arsenmUnsubmitted Not Done Reply Inline Actions This could be the end iterator arsenm: This could be the end iterator
		MI, FI, nullptr, Indexes, LIS, true);
		if (!Spilled)
		llvm_unreachable(
		"failed to spill SGPR to physical VGPR lane when allocated");
		}
		} else {
		arsenmUnsubmitted Not Done Reply Inline Actions Typo " in case if multiple spills" arsenm: Typo " in case if multiple spills"
		if (FuncInfo->allocateSGPRSpillToVGPRLane(MF, FI)) {
		bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(
MI, FI, nullptr, Indexes, LIS);		MI, FI, nullptr, Indexes, LIS);
(void)Spilled;		if (!Spilled)
assert(Spilled && "failed to spill SGPR to VGPR when allocated");		llvm_unreachable(
		"failed to spill SGPR to virtual VGPR lane when allocated");
SpillFIs.set(FI);		SpillFIs.set(FI);
		SpilledToVirtVGPRLanes = true;
		}
}		}
}		}
}		}

// FIXME: Adding to live-ins redundant with reserving registers.		if (SpilledToVirtVGPRLanes) {
for (MachineBasicBlock &MBB : MF) {		extendWWMVirtRegLiveness(MF, LIS);
		if (LIS) {
		// Compute the LiveInterval for the newly created virtual registers.
for (auto Reg : FuncInfo->getSGPRSpillVGPRs())		for (auto Reg : FuncInfo->getSGPRSpillVGPRs())
MBB.addLiveIn(Reg);		LIS->createAndComputeVirtRegInterval(Reg);
		arsenmUnsubmitted Not Done Reply Inline Actions It might be worth adding a target comment flag for this implicit def to comment it's for SGPR spilling arsenm: It might be worth adding a target comment flag for this implicit def to comment it's for SGPR…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is handled/printed by the generic part of AsmPrinter and it won't reach the target-specific emitInstruction at all. cdevadas: I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is…
		arsenmUnsubmitted Not Done Reply Inline Actions You don't need to specially handle the instruction, see AsmPrinterFlags arsenm: You don't need to specially handle the instruction, see AsmPrinterFlags
		yassinghUnsubmitted Not Done Reply Inline Actions Tried adding a new flag here D153754 yassingh: Tried adding a new flag here D153754
MBB.sortUniqueLiveIns();		}
		}

		for (MachineBasicBlock &MBB : MF) {
// FIXME: The dead frame indices are replaced with a null register from		// FIXME: The dead frame indices are replaced with a null register from
// the debug value instructions. We should instead, update it with the		// the debug value instructions. We should instead, update it with the
// correct register value. But not sure the register value alone is		// correct register value. But not sure the register value alone is
// adequate to lower the DIExpression. It should be worked out later.		// adequate to lower the DIExpression. It should be worked out later.
for (MachineInstr &MI : MBB) {		for (MachineInstr &MI : MBB) {
if (MI.isDebugValue() && MI.getOperand(0).isFI() &&		if (MI.isDebugValue() && MI.getOperand(0).isFI() &&
!MFI.isFixedObjectIndex(MI.getOperand(0).getIndex()) &&		!MFI.isFixedObjectIndex(MI.getOperand(0).getIndex()) &&
SpillFIs[MI.getOperand(0).getIndex()]) {		SpillFIs[MI.getOperand(0).getIndex()]) {
MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);		MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);
}		}
}		}
}		}

// All those frame indices which are dead by now should be removed from the		// All those frame indices which are dead by now should be removed from the
// function frame. Otherwise, there is a side effect such as re-mapping of		// function frame. Otherwise, there is a side effect such as re-mapping of
// free frame index ids by the later pass(es) like "stack slot coloring"		// free frame index ids by the later pass(es) like "stack slot coloring"
// which in turn could mess-up with the book keeping of "frame index to VGPR		// which in turn could mess-up with the book keeping of "frame index to VGPR
// lane".		// lane".
FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ false);		FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ false);

		MadeChange = true;
		}

		arsenmUnsubmitted Not Done Reply Inline Actions Should implement MachineFunctionPass::getClearedProperties instead of clearing these here arsenm: Should implement MachineFunctionPass::getClearedProperties instead of clearing these here
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Will do. cdevadas: Will do.
		if (SpilledToVirtVGPRLanes) {
const TargetRegisterClass *RC = TRI->getWaveMaskRegClass();		const TargetRegisterClass *RC = TRI->getWaveMaskRegClass();
// Shift back the reserved SGPR for EXEC copy into the lowest range.		// Shift back the reserved SGPR for EXEC copy into the lowest range.
// This SGPR is reserved to handle the whole-wave spill/copy operations		// This SGPR is reserved to handle the whole-wave spill/copy operations
// that might get inserted during vgpr regalloc.		// that might get inserted during vgpr regalloc.
Register UnusedLowSGPR = TRI->findUnusedRegister(MRI, RC, MF);		Register UnusedLowSGPR = TRI->findUnusedRegister(MRI, RC, MF);
if (UnusedLowSGPR && TRI->getHWRegIndex(UnusedLowSGPR) <		if (UnusedLowSGPR && TRI->getHWRegIndex(UnusedLowSGPR) <
TRI->getHWRegIndex(FuncInfo->getSGPRForEXECCopy()))		TRI->getHWRegIndex(FuncInfo->getSGPRForEXECCopy()))
FuncInfo->setSGPRForEXECCopy(UnusedLowSGPR);		FuncInfo->setSGPRForEXECCopy(UnusedLowSGPR);

MadeChange = true;
} else {		} else {
// No SGPR spills and hence there won't be any WWM spills/copies. Reset the		// No SGPR spills to virtual VGPR lanes and hence there won't be any WWM
// SGPR reserved for EXEC copy.		// spills/copies. Reset the SGPR reserved for EXEC copy.
FuncInfo->setSGPRForEXECCopy(AMDGPU::NoRegister);		FuncInfo->setSGPRForEXECCopy(AMDGPU::NoRegister);
}		}

SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();

// Updated the reserved registers with any VGPRs added for SGPR spills.		// Updated the reserved registers with any physical VGPRs added for SGPR
if (NewReservedRegs)		// spills.
MRI.freezeReservedRegs(MF);		if (NewReservedRegs) {
		for (Register Reg : FuncInfo->getWWMReservedRegs())
		MRI.reserveReg(Reg, TRI);
		}

return MadeChange;		return MadeChange;
}		}

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	private:
// current hardware only allows a 16 bit value.		// current hardware only allows a 16 bit value.
unsigned GITPtrHigh;		unsigned GITPtrHigh;

unsigned HighBitsOf32BitAddress;		unsigned HighBitsOf32BitAddress;

// Flags associated with the virtual registers.		// Flags associated with the virtual registers.
IndexedMap<uint8_t, VirtReg2IndexFunctor> VRegFlags;		IndexedMap<uint8_t, VirtReg2IndexFunctor> VRegFlags;

// Current recorded maximum possible occupancy.		// Current recorded maximum possible occupancy.
unsigned Occupancy;		unsigned Occupancy;

mutable std::optional<bool> UsesAGPRs;		mutable std::optional<bool> UsesAGPRs;

MCPhysReg getNextUserSGPR() const;		MCPhysReg getNextUserSGPR() const;
		arsenmUnsubmitted Not Done Reply Inline Actions I don't like having state here for a single operation that's happening in one pass and isn't valid for multiple uses. I don't really understand how this is being set and passed around arsenm: I don't like having state here for a single operation that's happening in one pass and isn't…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions CurrentVRegSpilled is needed to track the virtual register (Liverange) for which the physical register was assigned. And it is needed only for fast regalloc . We need this mapping to correctly track the WWM spills as RegAllocFast spills/restore the physical registers directly as there is no VRM. This will be appropriately set with the delegate MRI_NoteVirtualRegisterSpill which is inserted in the RegAllocFast spill/reload functions. SIMachineFunctionInfo is where the delegates are currently handled and I don't have a better place to move it. cdevadas: CurrentVRegSpilled is needed to track the virtual register (Liverange) for which the physical…

MCPhysReg getNextSystemSGPR() const;		MCPhysReg getNextSystemSGPR() const;

// MachineRegisterInfo callback functions to notify events.		// MachineRegisterInfo callback functions to notify events.
void MRI_NoteNewVirtualRegister(Register Reg) override;		void MRI_NoteNewVirtualRegister(Register Reg) override;
void MRI_NoteCloneVirtualRegister(Register NewReg, Register SrcReg) override;		void MRI_NoteCloneVirtualRegister(Register NewReg, Register SrcReg) override;

public:		public:
struct VGPRSpillToAGPR {		struct VGPRSpillToAGPR {
SmallVector<MCPhysReg, 32> Lanes;		SmallVector<MCPhysReg, 32> Lanes;
bool FullyAllocated = false;		bool FullyAllocated = false;
bool IsDead = false;		bool IsDead = false;
};		};

private:		private:
// To track VGPR + lane index for each subregister of the SGPR spilled to		// To track virtual VGPR + lane index for each subregister of the SGPR spilled
// frameindex key during SILowerSGPRSpills pass.		// to frameindex key during SILowerSGPRSpills pass.
DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRSpillToVGPRLanes;
// To track VGPR + lane index for spilling special SGPRs like Frame Pointer
// identified during PrologEpilogInserter.
DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>>		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>>
PrologEpilogSGPRSpillToVGPRLanes;		SGPRSpillsToVirtualVGPRLanes;
unsigned NumVGPRSpillLanes = 0;		// To track physical VGPR + lane index for CSR SGPR spills and special SGPRs
unsigned NumVGPRPrologEpilogSpillLanes = 0;		// like Frame Pointer identified during PrologEpilogInserter.
		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>>
		SGPRSpillsToPhysicalVGPRLanes;
		unsigned NumVirtualVGPRSpillLanes = 0;
		unsigned NumPhysicalVGPRSpillLanes = 0;
SmallVector<Register, 2> SpillVGPRs;		SmallVector<Register, 2> SpillVGPRs;
using WWMSpillsMap = MapVector<Register, int>;		using WWMSpillsMap = MapVector<Register, int>;
// To track the registers used in instructions that can potentially modify the		// To track the registers used in instructions that can potentially modify the
// inactive lanes. The WWM instructions and the writelane instructions for		// inactive lanes. The WWM instructions and the writelane instructions for
// spilling SGPRs to VGPRs fall under such category of operations. The VGPRs		// spilling SGPRs to VGPRs fall under such category of operations. The VGPRs
// modified by them should be spilled/restored at function prolog/epilog to		// modified by them should be spilled/restored at function prolog/epilog to
// avoid any undesired outcome. Each entry in this map holds a pair of values,		// avoid any undesired outcome. Each entry in this map holds a pair of values,
// the VGPR and its stack slot index.		// the VGPR and its stack slot index.
Show All 27 Lines	private:

// Emergency stack slot. Sometimes, we create this before finalizing the stack		// Emergency stack slot. Sometimes, we create this before finalizing the stack
// frame, so save it here and add it to the RegScavenger later.		// frame, so save it here and add it to the RegScavenger later.
std::optional<int> ScavengeFI;		std::optional<int> ScavengeFI;

private:		private:
Register VGPRForAGPRCopy;		Register VGPRForAGPRCopy;

bool allocateVGPRForSGPRSpills(MachineFunction &MF, int FI,		bool allocateVirtualVGPRForSGPRSpills(MachineFunction &MF, int FI,
unsigned LaneIndex);		unsigned LaneIndex);
bool allocateVGPRForPrologEpilogSGPRSpills(MachineFunction &MF, int FI,		bool allocatePhysicalVGPRForSGPRSpills(MachineFunction &MF, int FI,
unsigned LaneIndex);		unsigned LaneIndex);

public:		public:
Register getVGPRForAGPRCopy() const {		Register getVGPRForAGPRCopy() const {
return VGPRForAGPRCopy;		return VGPRForAGPRCopy;
}		}

void setVGPRForAGPRCopy(Register NewVGPRForAGPRCopy) {		void setVGPRForAGPRCopy(Register NewVGPRForAGPRCopy) {
VGPRForAGPRCopy = NewVGPRForAGPRCopy;		VGPRForAGPRCopy = NewVGPRForAGPRCopy;
Show All 15 Lines	bool initializeBaseYamlFields(const yaml::SIMachineFunctionInfo &YamlMFI,
PerFunctionMIParsingState &PFS,		PerFunctionMIParsingState &PFS,
SMDiagnostic &Error, SMRange &SourceRange);		SMDiagnostic &Error, SMRange &SourceRange);

void reserveWWMRegister(Register Reg) { WWMReservedRegs.insert(Reg); }		void reserveWWMRegister(Register Reg) { WWMReservedRegs.insert(Reg); }

SIModeRegisterDefaults getMode() const { return Mode; }		SIModeRegisterDefaults getMode() const { return Mode; }

ArrayRef<SIRegisterInfo::SpilledReg>		ArrayRef<SIRegisterInfo::SpilledReg>
getSGPRSpillToVGPRLanes(int FrameIndex) const {		getSGPRSpillToVirtualVGPRLanes(int FrameIndex) const {
auto I = SGPRSpillToVGPRLanes.find(FrameIndex);		auto I = SGPRSpillsToVirtualVGPRLanes.find(FrameIndex);
return (I == SGPRSpillToVGPRLanes.end())		return (I == SGPRSpillsToVirtualVGPRLanes.end())
? ArrayRef<SIRegisterInfo::SpilledReg>()		? ArrayRef<SIRegisterInfo::SpilledReg>()
: ArrayRef(I->second);		: ArrayRef(I->second);
}		}

ArrayRef<Register> getSGPRSpillVGPRs() const { return SpillVGPRs; }		ArrayRef<Register> getSGPRSpillVGPRs() const { return SpillVGPRs; }
const WWMSpillsMap &getWWMSpills() const { return WWMSpills; }		const WWMSpillsMap &getWWMSpills() const { return WWMSpills; }
const ReservedRegSet &getWWMReservedRegs() const { return WWMReservedRegs; }		const ReservedRegSet &getWWMReservedRegs() const { return WWMReservedRegs; }

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	public:
getPrologEpilogSGPRSaveRestoreInfo(Register Reg) const {		getPrologEpilogSGPRSaveRestoreInfo(Register Reg) const {
auto I = PrologEpilogSGPRSpills.find(Reg);		auto I = PrologEpilogSGPRSpills.find(Reg);
assert(I != PrologEpilogSGPRSpills.end());		assert(I != PrologEpilogSGPRSpills.end());

return I->second;		return I->second;
}		}

ArrayRef<SIRegisterInfo::SpilledReg>		ArrayRef<SIRegisterInfo::SpilledReg>
getPrologEpilogSGPRSpillToVGPRLanes(int FrameIndex) const {		getSGPRSpillToPhysicalVGPRLanes(int FrameIndex) const {
auto I = PrologEpilogSGPRSpillToVGPRLanes.find(FrameIndex);		auto I = SGPRSpillsToPhysicalVGPRLanes.find(FrameIndex);
return (I == PrologEpilogSGPRSpillToVGPRLanes.end())		return (I == SGPRSpillsToPhysicalVGPRLanes.end())
? ArrayRef<SIRegisterInfo::SpilledReg>()		? ArrayRef<SIRegisterInfo::SpilledReg>()
: ArrayRef(I->second);		: ArrayRef(I->second);
}		}

void setFlag(Register Reg, uint8_t Flag) {		void setFlag(Register Reg, uint8_t Flag) {
assert(Reg.isVirtual());		assert(Reg.isVirtual());
if (VRegFlags.inBounds(Reg))		if (VRegFlags.inBounds(Reg))
VRegFlags[Reg] \|= Flag;		VRegFlags[Reg] \|= Flag;
}		}

bool checkFlag(Register Reg, uint8_t Flag) const {		bool checkFlag(Register Reg, uint8_t Flag) const {
if (Reg.isPhysical())		if (Reg.isPhysical())
		arsenmUnsubmitted Not Done Reply Inline Actions Reg.isVirtual() arsenm: Reg.isVirtual()
return false;		return false;

return VRegFlags.inBounds(Reg) && VRegFlags[Reg] & Flag;		return VRegFlags.inBounds(Reg) && VRegFlags[Reg] & Flag;
}		}

bool hasVRegFlags() { return VRegFlags.size(); }		bool hasVRegFlags() { return VRegFlags.size(); }
		arsenmUnsubmitted Not Done Reply Inline Actions Reg.isVirtual() arsenm: Reg.isVirtual()

void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,		void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,
Align Alignment = Align(4));		Align Alignment = Align(4));

void splitWWMSpillRegisters(		void splitWWMSpillRegisters(
MachineFunction &MF,		MachineFunction &MF,
SmallVectorImpl<std::pair<Register, int>> &CalleeSavedRegs,		SmallVectorImpl<std::pair<Register, int>> &CalleeSavedRegs,
SmallVectorImpl<std::pair<Register, int>> &ScratchRegs) const;		SmallVectorImpl<std::pair<Register, int>> &ScratchRegs) const;
▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	bool SIMachineFunctionInfo::isCalleeSavedReg(const MCPhysReg *CSRegs,
for (unsigned I = 0; CSRegs[I]; ++I) {		for (unsigned I = 0; CSRegs[I]; ++I) {
if (CSRegs[I] == Reg)		if (CSRegs[I] == Reg)
return true;		return true;
}		}

return false;		return false;
}		}

bool SIMachineFunctionInfo::allocateVGPRForSGPRSpills(MachineFunction &MF,		bool SIMachineFunctionInfo::allocateVirtualVGPRForSGPRSpills(
int FI,		MachineFunction &MF, int FI, unsigned LaneIndex) {
unsigned LaneIndex) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
Register LaneVGPR;		Register LaneVGPR;
if (!LaneIndex) {		if (!LaneIndex) {
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		LaneVGPR = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
		arsenmUnsubmitted Not Done Reply Inline Actions As part of the follow up to allow spill slot sharing, I think we can move all of this allocation stuff out of SIMachineFunctionInfo and into SILowerSGPRSpills arsenm: As part of the follow up to allow spill slot sharing, I think we can move all of this…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Ya, will try to move it entirely out of SIMachineFunctionInfo. cdevadas: Ya, will try to move it entirely out of SIMachineFunctionInfo.
if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.
SGPRSpillToVGPRLanes.erase(FI);
return false;
}

SpillVGPRs.push_back(LaneVGPR);		SpillVGPRs.push_back(LaneVGPR);
// Add this register as live-in to all blocks to avoid machine verifier
// complaining about use of an undefined physical register.
for (MachineBasicBlock &BB : MF)
BB.addLiveIn(LaneVGPR);
} else {		} else {
LaneVGPR = SpillVGPRs.back();		LaneVGPR = SpillVGPRs.back();
}		}

SGPRSpillToVGPRLanes[FI].push_back(		SGPRSpillsToVirtualVGPRLanes[FI].push_back(
SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));		SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));
return true;		return true;
}		}

bool SIMachineFunctionInfo::allocateVGPRForPrologEpilogSGPRSpills(		bool SIMachineFunctionInfo::allocatePhysicalVGPRForSGPRSpills(
MachineFunction &MF, int FI, unsigned LaneIndex) {		MachineFunction &MF, int FI, unsigned LaneIndex) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
Register LaneVGPR;		Register LaneVGPR;
if (!LaneIndex) {		if (!LaneIndex) {
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
if (LaneVGPR == AMDGPU::NoRegister) {		if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not		// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.		// partially spill the SGPR to VGPRs.
PrologEpilogSGPRSpillToVGPRLanes.erase(FI);		SGPRSpillsToPhysicalVGPRLanes.erase(FI);
return false;		return false;
}		}

allocateWWMSpill(MF, LaneVGPR);		allocateWWMSpill(MF, LaneVGPR);
		reserveWWMRegister(LaneVGPR);
		for (MachineBasicBlock &MBB : MF) {
		MBB.addLiveIn(LaneVGPR);
		MBB.sortUniqueLiveIns();
		}
} else {		} else {
LaneVGPR = WWMSpills.back().first;		LaneVGPR = WWMReservedRegs.back();
}		}

PrologEpilogSGPRSpillToVGPRLanes[FI].push_back(		SGPRSpillsToPhysicalVGPRLanes[FI].push_back(
SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));		SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));
return true;		return true;
}		}

bool SIMachineFunctionInfo::allocateSGPRSpillToVGPRLane(MachineFunction &MF,		bool SIMachineFunctionInfo::allocateSGPRSpillToVGPRLane(MachineFunction &MF,
int FI,		int FI,
bool IsPrologEpilog) {		bool IsPrologEpilog) {
std::vector<SIRegisterInfo::SpilledReg> &SpillLanes =		std::vector<SIRegisterInfo::SpilledReg> &SpillLanes =
IsPrologEpilog ? PrologEpilogSGPRSpillToVGPRLanes[FI]		IsPrologEpilog ? SGPRSpillsToPhysicalVGPRLanes[FI]
: SGPRSpillToVGPRLanes[FI];		: SGPRSpillsToVirtualVGPRLanes[FI];

// This has already been allocated.		// This has already been allocated.
if (!SpillLanes.empty())		if (!SpillLanes.empty())
return true;		return true;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
unsigned WaveSize = ST.getWavefrontSize();		unsigned WaveSize = ST.getWavefrontSize();

unsigned Size = FrameInfo.getObjectSize(FI);		unsigned Size = FrameInfo.getObjectSize(FI);
unsigned NumLanes = Size / 4;		unsigned NumLanes = Size / 4;

if (NumLanes > WaveSize)		if (NumLanes > WaveSize)
return false;		return false;

assert(Size >= 4 && "invalid sgpr spill size");		assert(Size >= 4 && "invalid sgpr spill size");
assert(ST.getRegisterInfo()->spillSGPRToVGPR() &&		assert(ST.getRegisterInfo()->spillSGPRToVGPR() &&
"not spilling SGPRs to VGPRs");		"not spilling SGPRs to VGPRs");

unsigned &NumSpillLanes =		unsigned &NumSpillLanes =
IsPrologEpilog ? NumVGPRPrologEpilogSpillLanes : NumVGPRSpillLanes;		IsPrologEpilog ? NumPhysicalVGPRSpillLanes : NumVirtualVGPRSpillLanes;

for (unsigned I = 0; I < NumLanes; ++I, ++NumSpillLanes) {		for (unsigned I = 0; I < NumLanes; ++I, ++NumSpillLanes) {
unsigned LaneIndex = (NumSpillLanes % WaveSize);		unsigned LaneIndex = (NumSpillLanes % WaveSize);

bool Allocated =		bool Allocated = IsPrologEpilog
IsPrologEpilog		? allocatePhysicalVGPRForSGPRSpills(MF, FI, LaneIndex)
? allocateVGPRForPrologEpilogSGPRSpills(MF, FI, LaneIndex)		: allocateVirtualVGPRForSGPRSpills(MF, FI, LaneIndex);
: allocateVGPRForSGPRSpills(MF, FI, LaneIndex);
if (!Allocated) {		if (!Allocated) {
NumSpillLanes -= I;		NumSpillLanes -= I;
return false;		return false;
}		}
}		}

return true;		return true;
}		}
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	for (int I = NumLanes - 1; I >= 0; --I) {
Spill.Lanes[I] = *NextSpillReg++;		Spill.Lanes[I] = *NextSpillReg++;
}		}

return Spill.FullyAllocated;		return Spill.FullyAllocated;
}		}

bool SIMachineFunctionInfo::removeDeadFrameIndices(		bool SIMachineFunctionInfo::removeDeadFrameIndices(
MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {		MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {
// Remove dead frame indices from function frame. And also make sure to remove		// Remove dead frame indices from function frame, however keep FP & BP since
// the frame indices from `SGPRSpillToVGPRLanes` data structure, otherwise, it		// spills for them haven't been inserted yet. And also make sure to remove the
// could result in an unexpected side effect and bug, in case of any		// frame indices from `SGPRSpillsToVirtualVGPRLanes` data structure,
// re-mapping of freed frame indices by later pass(es) like "stack slot		// otherwise, it could result in an unexpected side effect and bug, in case of
		// any re-mapping of freed frame indices by later pass(es) like "stack slot
// coloring".		// coloring".
for (auto &R : make_early_inc_range(SGPRSpillToVGPRLanes)) {		for (auto &R : make_early_inc_range(SGPRSpillsToVirtualVGPRLanes)) {
MFI.RemoveStackObject(R.first);		MFI.RemoveStackObject(R.first);
SGPRSpillToVGPRLanes.erase(R.first);		SGPRSpillsToVirtualVGPRLanes.erase(R.first);
}		}

		// Remove the dead frame indices of CSR SGPRs which are spilled to physical
		// VGPR lanes during SILowerSGPRSpills pass.
		if (!ResetSGPRSpillStackIDs) {
		for (auto &R : make_early_inc_range(SGPRSpillsToPhysicalVGPRLanes)) {
		MFI.RemoveStackObject(R.first);
		SGPRSpillsToPhysicalVGPRLanes.erase(R.first);
		}
		}
bool HaveSGPRToMemory = false;		bool HaveSGPRToMemory = false;

if (ResetSGPRSpillStackIDs) {		if (ResetSGPRSpillStackIDs) {
// All other SGPRs must be allocated on the default stack, so reset the		// All other SGPRs must be allocated on the default stack, so reset the
// stack ID.		// stack ID.
for (int I = MFI.getObjectIndexBegin(), E = MFI.getObjectIndexEnd(); I != E;		for (int I = MFI.getObjectIndexBegin(), E = MFI.getObjectIndexEnd(); I != E;
++I) {		++I) {
if (!checkIndexInPrologEpilogSGPRSpills(I)) {		if (!checkIndexInPrologEpilogSGPRSpills(I)) {
▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	public:
/// a cross register class copy, return the specified RC. Returns NULL if it		/// a cross register class copy, return the specified RC. Returns NULL if it
/// is not possible to copy between two registers of the specified class.		/// is not possible to copy between two registers of the specified class.
const TargetRegisterClass *		const TargetRegisterClass *
getCrossCopyRegClass(const TargetRegisterClass *RC) const override;		getCrossCopyRegClass(const TargetRegisterClass *RC) const override;

void buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index, int Offset,		void buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index, int Offset,
bool IsLoad, bool IsKill = true) const;		bool IsLoad, bool IsKill = true) const;

/// If \p OnlyToVGPR is true, this will only succeed if this		/// If \p OnlyToVGPR is true, this will only succeed if this manages to find a
		/// free VGPR lane to spill.
bool spillSGPR(MachineBasicBlock::iterator MI, int FI, RegScavenger *RS,		bool spillSGPR(MachineBasicBlock::iterator MI, int FI, RegScavenger *RS,
SlotIndexes Indexes = nullptr, LiveIntervals LIS = nullptr,		SlotIndexes Indexes = nullptr, LiveIntervals LIS = nullptr,
bool OnlyToVGPR = false) const;		bool OnlyToVGPR = false,
		bool SpillToPhysVGPRLane = false) const;

bool restoreSGPR(MachineBasicBlock::iterator MI, int FI, RegScavenger *RS,		bool restoreSGPR(MachineBasicBlock::iterator MI, int FI, RegScavenger *RS,
SlotIndexes Indexes = nullptr, LiveIntervals LIS = nullptr,		SlotIndexes Indexes = nullptr, LiveIntervals LIS = nullptr,
bool OnlyToVGPR = false) const;		bool OnlyToVGPR = false,
		bool SpillToPhysVGPRLane = false) const;

bool spillEmergencySGPR(MachineBasicBlock::iterator MI,		bool spillEmergencySGPR(MachineBasicBlock::iterator MI,
MachineBasicBlock &RestoreMBB, Register SGPR,		MachineBasicBlock &RestoreMBB, Register SGPR,
RegScavenger *RS) const;		RegScavenger *RS) const;

bool supportsBackwardScavenger() const override {		bool supportsBackwardScavenger() const override {
return true;		return true;
}		}

bool eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,		bool eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
unsigned FIOperandNum,		unsigned FIOperandNum,
RegScavenger *RS) const override;		RegScavenger *RS) const override;

bool eliminateSGPRToVGPRSpillFrameIndex(MachineBasicBlock::iterator MI,		bool eliminateSGPRToVGPRSpillFrameIndex(
int FI, RegScavenger *RS,		MachineBasicBlock::iterator MI, int FI, RegScavenger *RS,
SlotIndexes *Indexes = nullptr,		SlotIndexes Indexes = nullptr, LiveIntervals LIS = nullptr,
LiveIntervals *LIS = nullptr) const;		bool SpillToPhysVGPRLane = false) const;

StringRef getRegAsmName(MCRegister Reg) const override;		StringRef getRegAsmName(MCRegister Reg) const override;

// Pseudo regs are not allowed		// Pseudo regs are not allowed
unsigned getHWRegIndex(MCRegister Reg) const {		unsigned getHWRegIndex(MCRegister Reg) const {
return getEncodingValue(Reg) & 0xff;		return getEncodingValue(Reg) & 0xff;
}		}

▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 643 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
if (hasBasePointer(MF)) {		if (hasBasePointer(MF)) {
MCRegister BasePtrReg = getBaseRegister();		MCRegister BasePtrReg = getBaseRegister();
reserveRegisterTuples(Reserved, BasePtrReg);		reserveRegisterTuples(Reserved, BasePtrReg);
assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));		assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));
}		}

// FIXME: Use same reserved register introduced in D149775		// FIXME: Use same reserved register introduced in D149775
// SGPR used to preserve EXEC MASK around WWM spill/copy instructions.		// SGPR used to preserve EXEC MASK around WWM spill/copy instructions.
Register ExecCopyReg = MFI->getSGPRForEXECCopy();		Register ExecCopyReg = MFI->getSGPRForEXECCopy();
		arsenmUnsubmitted Not Done Reply Inline Actions Isn't this always required? arsenm: Isn't this always required?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions No. They are reserved only if RA inserts any whole wave spill. cdevadas: No. They are reserved only if RA inserts any whole wave spill.
if (ExecCopyReg)		if (ExecCopyReg)
reserveRegisterTuples(Reserved, ExecCopyReg);		reserveRegisterTuples(Reserved, ExecCopyReg);

// Reserve VGPRs/AGPRs.		// Reserve VGPRs/AGPRs.
//		//
unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);		unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);
unsigned MaxNumAGPRs = MaxNumVGPRs;		unsigned MaxNumAGPRs = MaxNumVGPRs;
unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();		unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {

// FIXME: Stop using reserved registers for this.		// FIXME: Stop using reserved registers for this.
for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())		for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())		for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (auto Reg : MFI->getSGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);

return Reserved;		return Reserved;
}		}

bool SIRegisterInfo::isAsmClobberable(const MachineFunction &MF,		bool SIRegisterInfo::isAsmClobberable(const MachineFunction &MF,
MCRegister PhysReg) const {		MCRegister PhysReg) const {
return !MF.getRegInfo().isReserved(PhysReg);		return !MF.getRegInfo().isReserved(PhysReg);
}		}

▲ Show 20 Lines • Show All 1,005 Lines • ▼ Show 20 Lines	buildSpillLoadStore(*SB.MBB, SB.MI, SB.DL, Opc, Index, SB.TmpVGPR, IsKill,
FrameReg, Offset * SB.EltSize, MMO, SB.RS);		FrameReg, Offset * SB.EltSize, MMO, SB.RS);
// This only ever adds one VGPR spill		// This only ever adds one VGPR spill
SB.MFI.addToSpilledVGPRs(1);		SB.MFI.addToSpilledVGPRs(1);
}		}
}		}

bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI, int Index,		bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI, int Index,
RegScavenger RS, SlotIndexes Indexes,		RegScavenger RS, SlotIndexes Indexes,
LiveIntervals *LIS, bool OnlyToVGPR) const {		LiveIntervals *LIS, bool OnlyToVGPR,
		bool SpillToPhysVGPRLane) const {
SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);		SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);

ArrayRef<SpilledReg> VGPRSpills = SB.MFI.getSGPRSpillToVGPRLanes(Index);		ArrayRef<SpilledReg> VGPRSpills =
		SpillToPhysVGPRLane ? SB.MFI.getSGPRSpillToPhysicalVGPRLanes(Index)
		: SB.MFI.getSGPRSpillToVirtualVGPRLanes(Index);
bool SpillToVGPR = !VGPRSpills.empty();		bool SpillToVGPR = !VGPRSpills.empty();
if (OnlyToVGPR && !SpillToVGPR)		if (OnlyToVGPR && !SpillToVGPR)
return false;		return false;

assert(SpillToVGPR \|\| (SB.SuperReg != SB.MFI.getStackPtrOffsetReg() &&		assert(SpillToVGPR \|\| (SB.SuperReg != SB.MFI.getStackPtrOffsetReg() &&
SB.SuperReg != SB.MFI.getFrameOffsetReg()));		SB.SuperReg != SB.MFI.getFrameOffsetReg()));

if (SpillToVGPR) {		if (SpillToVGPR) {
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI, int Index,
if (LIS)		if (LIS)
LIS->removeAllRegUnitsForPhysReg(SB.SuperReg);		LIS->removeAllRegUnitsForPhysReg(SB.SuperReg);

return true;		return true;
}		}

bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI, int Index,		bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI, int Index,
RegScavenger RS, SlotIndexes Indexes,		RegScavenger RS, SlotIndexes Indexes,
LiveIntervals *LIS, bool OnlyToVGPR) const {		LiveIntervals *LIS, bool OnlyToVGPR,
		bool SpillToPhysVGPRLane) const {
SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);		SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);

ArrayRef<SpilledReg> VGPRSpills = SB.MFI.getSGPRSpillToVGPRLanes(Index);		ArrayRef<SpilledReg> VGPRSpills =
		SpillToPhysVGPRLane ? SB.MFI.getSGPRSpillToPhysicalVGPRLanes(Index)
		: SB.MFI.getSGPRSpillToVirtualVGPRLanes(Index);
bool SpillToVGPR = !VGPRSpills.empty();		bool SpillToVGPR = !VGPRSpills.empty();
if (OnlyToVGPR && !SpillToVGPR)		if (OnlyToVGPR && !SpillToVGPR)
return false;		return false;

if (SpillToVGPR) {		if (SpillToVGPR) {
for (unsigned i = 0, e = SB.NumSubRegs; i < e; ++i) {		for (unsigned i = 0, e = SB.NumSubRegs; i < e; ++i) {
Register SubReg =		Register SubReg =
SB.NumSubRegs == 1		SB.NumSubRegs == 1
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	bool SIRegisterInfo::spillEmergencySGPR(MachineBasicBlock::iterator MI,
return false;		return false;
}		}

/// Special case of eliminateFrameIndex. Returns true if the SGPR was spilled to		/// Special case of eliminateFrameIndex. Returns true if the SGPR was spilled to
/// a VGPR and the stack slot can be safely eliminated when all other users are		/// a VGPR and the stack slot can be safely eliminated when all other users are
/// handled.		/// handled.
bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex(		bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex(
MachineBasicBlock::iterator MI, int FI, RegScavenger *RS,		MachineBasicBlock::iterator MI, int FI, RegScavenger *RS,
SlotIndexes Indexes, LiveIntervals LIS) const {		SlotIndexes Indexes, LiveIntervals LIS, bool SpillToPhysVGPRLane) const {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_S1024_SAVE:		case AMDGPU::SI_SPILL_S1024_SAVE:
case AMDGPU::SI_SPILL_S512_SAVE:		case AMDGPU::SI_SPILL_S512_SAVE:
case AMDGPU::SI_SPILL_S384_SAVE:		case AMDGPU::SI_SPILL_S384_SAVE:
case AMDGPU::SI_SPILL_S352_SAVE:		case AMDGPU::SI_SPILL_S352_SAVE:
case AMDGPU::SI_SPILL_S320_SAVE:		case AMDGPU::SI_SPILL_S320_SAVE:
case AMDGPU::SI_SPILL_S288_SAVE:		case AMDGPU::SI_SPILL_S288_SAVE:
case AMDGPU::SI_SPILL_S256_SAVE:		case AMDGPU::SI_SPILL_S256_SAVE:
case AMDGPU::SI_SPILL_S224_SAVE:		case AMDGPU::SI_SPILL_S224_SAVE:
case AMDGPU::SI_SPILL_S192_SAVE:		case AMDGPU::SI_SPILL_S192_SAVE:
case AMDGPU::SI_SPILL_S160_SAVE:		case AMDGPU::SI_SPILL_S160_SAVE:
case AMDGPU::SI_SPILL_S128_SAVE:		case AMDGPU::SI_SPILL_S128_SAVE:
case AMDGPU::SI_SPILL_S96_SAVE:		case AMDGPU::SI_SPILL_S96_SAVE:
case AMDGPU::SI_SPILL_S64_SAVE:		case AMDGPU::SI_SPILL_S64_SAVE:
case AMDGPU::SI_SPILL_S32_SAVE:		case AMDGPU::SI_SPILL_S32_SAVE:
return spillSGPR(MI, FI, RS, Indexes, LIS, true);		return spillSGPR(MI, FI, RS, Indexes, LIS, true, SpillToPhysVGPRLane);
case AMDGPU::SI_SPILL_S1024_RESTORE:		case AMDGPU::SI_SPILL_S1024_RESTORE:
case AMDGPU::SI_SPILL_S512_RESTORE:		case AMDGPU::SI_SPILL_S512_RESTORE:
case AMDGPU::SI_SPILL_S384_RESTORE:		case AMDGPU::SI_SPILL_S384_RESTORE:
case AMDGPU::SI_SPILL_S352_RESTORE:		case AMDGPU::SI_SPILL_S352_RESTORE:
case AMDGPU::SI_SPILL_S320_RESTORE:		case AMDGPU::SI_SPILL_S320_RESTORE:
case AMDGPU::SI_SPILL_S288_RESTORE:		case AMDGPU::SI_SPILL_S288_RESTORE:
case AMDGPU::SI_SPILL_S256_RESTORE:		case AMDGPU::SI_SPILL_S256_RESTORE:
case AMDGPU::SI_SPILL_S224_RESTORE:		case AMDGPU::SI_SPILL_S224_RESTORE:
case AMDGPU::SI_SPILL_S192_RESTORE:		case AMDGPU::SI_SPILL_S192_RESTORE:
case AMDGPU::SI_SPILL_S160_RESTORE:		case AMDGPU::SI_SPILL_S160_RESTORE:
case AMDGPU::SI_SPILL_S128_RESTORE:		case AMDGPU::SI_SPILL_S128_RESTORE:
case AMDGPU::SI_SPILL_S96_RESTORE:		case AMDGPU::SI_SPILL_S96_RESTORE:
case AMDGPU::SI_SPILL_S64_RESTORE:		case AMDGPU::SI_SPILL_S64_RESTORE:
case AMDGPU::SI_SPILL_S32_RESTORE:		case AMDGPU::SI_SPILL_S32_RESTORE:
return restoreSGPR(MI, FI, RS, Indexes, LIS, true);		return restoreSGPR(MI, FI, RS, Indexes, LIS, true, SpillToPhysVGPRLane);
default:		default:
llvm_unreachable("not an SGPR spill instruction");		llvm_unreachable("not an SGPR spill instruction");
}		}
}		}

bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,		bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
int SPAdj, unsigned FIOperandNum,		int SPAdj, unsigned FIOperandNum,
RegScavenger *RS) const {		RegScavenger *RS) const {
Show All 9 Lines	bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
MachineOperand &FIOp = MI->getOperand(FIOperandNum);		MachineOperand &FIOp = MI->getOperand(FIOperandNum);
int Index = MI->getOperand(FIOperandNum).getIndex();		int Index = MI->getOperand(FIOperandNum).getIndex();

Register FrameReg = FrameInfo.isFixedObjectIndex(Index) && hasBasePointer(*MF)		Register FrameReg = FrameInfo.isFixedObjectIndex(Index) && hasBasePointer(*MF)
? getBaseRegister()		? getBaseRegister()
: getFrameRegister(*MF);		: getFrameRegister(*MF);

switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
// SGPR register spill		// SGPR register spill
		Pierre-vhUnsubmitted Not Done Reply Inline Actions Why does SCC need to be dead? What happens if another instruction right after uses it? Pierre-vh: Why does SCC need to be dead? What happens if another instruction right after uses it?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions The code here is only to manipulate exec mask and no other instruction depends on the SCC that it produces, and we should mark it dead to avoid unwanted side effects. We don't have an alternate instruction that doesn't clobber SCC. cdevadas: The code here is only to manipulate exec mask and no other instruction depends on the SCC that…
		Pierre-vhUnsubmitted Not Done Reply Inline Actions Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is alive? I was trying out this patch and I have a case where it's causing issues: S_CMP_EQ_U32 killed renamable $sgpr6, killed renamable $sgpr7, implicit-def $scc renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr4, 4, $vgpr0(tied-def 0), implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5 renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr5, 5, $vgpr0(tied-def 0), implicit killed $sgpr4_sgpr5 $sgpr10_sgpr11 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec $agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec $exec = S_MOV_B64 killed $sgpr10_sgpr11 S_CBRANCH_SCC1 %bb.5, implicit killed $scc Insertion is between the S_CMP and the S_CBRANCH. Pierre-vh: Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, the check is already in place. See the code above, the if condition, that inserts two separate move instructions when SCC is live and the else part uses SCC when it is free. Not sure why RegScavenger returned false. It should have returned SCC as clobbered. cdevadas: Yes, the check is already in place. See the code above, the if condition, that inserts two…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS returned the correct liveness info for SCC. cdevadas: See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS…
case AMDGPU::SI_SPILL_S1024_SAVE:		case AMDGPU::SI_SPILL_S1024_SAVE:
case AMDGPU::SI_SPILL_S512_SAVE:		case AMDGPU::SI_SPILL_S512_SAVE:
case AMDGPU::SI_SPILL_S384_SAVE:		case AMDGPU::SI_SPILL_S384_SAVE:
case AMDGPU::SI_SPILL_S352_SAVE:		case AMDGPU::SI_SPILL_S352_SAVE:
case AMDGPU::SI_SPILL_S320_SAVE:		case AMDGPU::SI_SPILL_S320_SAVE:
case AMDGPU::SI_SPILL_S288_SAVE:		case AMDGPU::SI_SPILL_S288_SAVE:
case AMDGPU::SI_SPILL_S256_SAVE:		case AMDGPU::SI_SPILL_S256_SAVE:
case AMDGPU::SI_SPILL_S224_SAVE:		case AMDGPU::SI_SPILL_S224_SAVE:
▲ Show 20 Lines • Show All 1,179 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -o - %s \| FileCheck %s

	declare hidden ptr addrspace(1) @ext(ptr addrspace(1))			declare hidden ptr addrspace(1) @ext(ptr addrspace(1))

	define ptr addrspace(1) @call_assert_align() {			define ptr addrspace(1) @call_assert_align() {
	; CHECK-LABEL: call_assert_align:			; CHECK-LABEL: call_assert_align:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s16, s33			; CHECK-NEXT: s_mov_b32 s16, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1			; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[18:19]			; CHECK-NEXT: s_mov_b64 exec, s[18:19]
				; CHECK-NEXT: v_writelane_b32 v40, s16, 2
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: v_writelane_b32 v41, s16, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, ext@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, ext@rel32@hi+12
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: v_mov_b32_e32 v2, 0			; CHECK-NEXT: v_mov_b32_e32 v2, 0
	; CHECK-NEXT: global_store_dword v[0:1], v2, off			; CHECK-NEXT: global_store_dword v[0:1], v2, off
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: v_readlane_b32 s4, v41, 0			; CHECK-NEXT: v_readlane_b32 s4, v40, 2
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call align 4 ptr addrspace(1) @ext(ptr addrspace(1) null)			%call = call align 4 ptr addrspace(1) @ext(ptr addrspace(1) null)
	store volatile i32 0, ptr addrspace(1) %call			store volatile i32 0, ptr addrspace(1) %call
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

	Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	define void @func_caller_stack() {			define void @func_caller_stack() {
	; MUBUF-LABEL: func_caller_stack:			; MUBUF-LABEL: func_caller_stack:
	; MUBUF: ; %bb.0:			; MUBUF: ; %bb.0:
	; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; MUBUF-NEXT: s_mov_b32 s4, s33			; MUBUF-NEXT: s_mov_b32 s4, s33
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_mov_b32_e32 v0, 9			; MUBUF-NEXT: v_mov_b32_e32 v0, 9
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; MUBUF-NEXT: v_mov_b32_e32 v0, 10			; MUBUF-NEXT: v_mov_b32_e32 v0, 10
				; MUBUF-NEXT: v_writelane_b32 v40, s4, 2
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; MUBUF-NEXT: v_mov_b32_e32 v0, 11			; MUBUF-NEXT: v_mov_b32_e32 v0, 11
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; MUBUF-NEXT: v_mov_b32_e32 v0, 12			; MUBUF-NEXT: v_mov_b32_e32 v0, 12
	; MUBUF-NEXT: v_writelane_b32 v41, s4, 0
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: v_readlane_b32 s4, v41, 0			; MUBUF-NEXT: v_readlane_b32 s4, v40, 2
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; MUBUF-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
	; MUBUF-NEXT: s_mov_b32 s33, s4			; MUBUF-NEXT: s_mov_b32 s33, s4
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_stack:			; FLATSCR-LABEL: func_caller_stack:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_mov_b32 s0, s33			; FLATSCR-NEXT: s_mov_b32 s0, s33
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_writelane_b32 v41, s0, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s0, 2
	; FLATSCR-NEXT: s_add_u32 s0, s32, 4			; FLATSCR-NEXT: s_add_u32 s0, s32, 4
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 9			; FLATSCR-NEXT: v_mov_b32_e32 v0, 9
	; FLATSCR-NEXT: scratch_store_dword off, v0, s0			; FLATSCR-NEXT: scratch_store_dword off, v0, s0
	; FLATSCR-NEXT: s_add_u32 s0, s32, 8			; FLATSCR-NEXT: s_add_u32 s0, s32, 8
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 10			; FLATSCR-NEXT: v_mov_b32_e32 v0, 10
	; FLATSCR-NEXT: scratch_store_dword off, v0, s0			; FLATSCR-NEXT: scratch_store_dword off, v0, s0
	; FLATSCR-NEXT: s_add_u32 s0, s32, 12			; FLATSCR-NEXT: s_add_u32 s0, s32, 12
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 11			; FLATSCR-NEXT: v_mov_b32_e32 v0, 11
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: scratch_store_dword off, v0, s0			; FLATSCR-NEXT: scratch_store_dword off, v0, s0
	; FLATSCR-NEXT: s_add_u32 s0, s32, 16			; FLATSCR-NEXT: s_add_u32 s0, s32, 16
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 12			; FLATSCR-NEXT: v_mov_b32_e32 v0, 12
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: scratch_store_dword off, v0, s0			; FLATSCR-NEXT: scratch_store_dword off, v0, s0
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: v_readlane_b32 s0, v41, 0			; FLATSCR-NEXT: v_readlane_b32 s0, v40, 2
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword v41, off, s33 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_add_i32 s32, s32, -16
	; FLATSCR-NEXT: s_mov_b32 s33, s0			; FLATSCR-NEXT: s_mov_b32 s33, s0
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)			call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)
	ret void			ret void
	}			}

	define void @func_caller_byval(ptr addrspace(5) %argptr) {			define void @func_caller_byval(ptr addrspace(5) %argptr) {
	; MUBUF-LABEL: func_caller_byval:			; MUBUF-LABEL: func_caller_byval:
	; MUBUF: ; %bb.0:			; MUBUF: ; %bb.0:
	; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; MUBUF-NEXT: s_mov_b32 s4, s33			; MUBUF-NEXT: s_mov_b32 s4, s33
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen
	; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
				; MUBUF-NEXT: v_writelane_b32 v40, s4, 2
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: v_writelane_b32 v41, s4, 0
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:60			; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:60
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:56			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:56
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: v_readlane_b32 s4, v41, 0			; MUBUF-NEXT: v_readlane_b32 s4, v40, 2
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; MUBUF-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
	; MUBUF-NEXT: s_mov_b32 s33, s4			; MUBUF-NEXT: s_mov_b32 s33, s4
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_byval:			; FLATSCR-LABEL: func_caller_byval:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_mov_b32 s0, s33			; FLATSCR-NEXT: s_mov_b32 s0, s33
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_add_u32_e32 v3, 8, v0			; FLATSCR-NEXT: v_add_u32_e32 v3, 8, v0
	; FLATSCR-NEXT: v_writelane_b32 v41, s0, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s0, 2
	; FLATSCR-NEXT: s_add_u32 s0, s32, 8			; FLATSCR-NEXT: s_add_u32 s0, s32, 8
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: s_add_u32 s2, s32, 56			; FLATSCR-NEXT: s_add_u32 s2, s32, 56
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v3, off			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v3, off
	; FLATSCR-NEXT: v_add_u32_e32 v3, 16, v0			; FLATSCR-NEXT: v_add_u32_e32 v3, 16, v0
	Show All 28 Lines
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s2			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s2
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: v_readlane_b32 s0, v41, 0			; FLATSCR-NEXT: v_readlane_b32 s0, v40, 2
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword v41, off, s33 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_add_i32 s32, s32, -16
	; FLATSCR-NEXT: s_mov_b32 s33, s0			; FLATSCR-NEXT: s_mov_b32 s33, s0
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_byval(ptr addrspace(5) byval([16 x i32]) %argptr)			call void @external_void_func_byval(ptr addrspace(5) byval([16 x i32]) %argptr)
	ret void			ret void
	}			}

	declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #1			declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #1

	attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }			attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }
	attributes #1 = { argmemonly nofree nounwind willreturn writeonly }			attributes #1 = { argmemonly nofree nounwind willreturn writeonly }

llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 -verify-machineinstrs -o - %s \| FileCheck %s

	; Make sure the waterfall loop does not fail the verifier after regalloc fast			; Make sure the waterfall loop does not fail the verifier after regalloc fast
	;			;
	; FIXME: There are a lot of extra spills that aren't needed. This is due to the unmerge_merge combine			; FIXME: There are a lot of extra spills that aren't needed. This is due to the unmerge_merge combine
	; running after RegBankSelect which inserts a lot of COPY instructions, but the original merge			; running after RegBankSelect which inserts a lot of COPY instructions, but the original merge
	; instruction (G_BUILD_VECTOR) stays because it has more than one use.			; instruction (G_BUILD_VECTOR) stays because it has more than one use.
	; Those spills are not present when optimizations are enabled.			; Those spills are not present when optimizations are enabled.
	define <4 x float> @waterfall_loop(<8 x i32> %vgpr_srd) {			define <4 x float> @waterfall_loop(<8 x i32> %vgpr_srd) {
	; CHECK-LABEL: waterfall_loop:			; CHECK-LABEL: waterfall_loop:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_xor_saveexec_b32 s4, -1			; CHECK-NEXT: s_xor_saveexec_b32 s4, -1
	; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:84 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; CHECK-NEXT: ; implicit-def: $vgpr8
				; CHECK-NEXT: v_mov_b32_e32 v8, v0
				; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
				; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v15, v1			; CHECK-NEXT: v_mov_b32_e32 v15, v1
	; CHECK-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v14, v2			; CHECK-NEXT: v_mov_b32_e32 v14, v2
	; CHECK-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v13, v3			; CHECK-NEXT: v_mov_b32_e32 v13, v3
	; CHECK-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v12, v4			; CHECK-NEXT: v_mov_b32_e32 v12, v4
	; CHECK-NEXT: buffer_store_dword v12, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v12, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v11, v5			; CHECK-NEXT: v_mov_b32_e32 v11, v5
	; CHECK-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v10, v6			; CHECK-NEXT: v_mov_b32_e32 v10, v6
	; CHECK-NEXT: buffer_store_dword v10, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v10, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v9, v7			; CHECK-NEXT: v_mov_b32_e32 v9, v7
	; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
	; CHECK-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 killed $exec			; CHECK-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8 killed $exec
	; CHECK-NEXT: v_mov_b32_e32 v1, v15			; CHECK-NEXT: v_mov_b32_e32 v2, v15
	; CHECK-NEXT: v_mov_b32_e32 v2, v14			; CHECK-NEXT: v_mov_b32_e32 v3, v14
	; CHECK-NEXT: v_mov_b32_e32 v3, v13			; CHECK-NEXT: v_mov_b32_e32 v4, v13
	; CHECK-NEXT: v_mov_b32_e32 v4, v12			; CHECK-NEXT: v_mov_b32_e32 v5, v12
	; CHECK-NEXT: v_mov_b32_e32 v5, v11			; CHECK-NEXT: v_mov_b32_e32 v6, v11
	; CHECK-NEXT: v_mov_b32_e32 v6, v10			; CHECK-NEXT: v_mov_b32_e32 v7, v10
	; CHECK-NEXT: v_mov_b32_e32 v7, v9			; CHECK-NEXT: v_mov_b32_e32 v8, v9
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 s8, 0			; CHECK-NEXT: s_mov_b32 s8, 0
	; CHECK-NEXT: s_mov_b32 s4, s8			; CHECK-NEXT: s_mov_b32 s4, s8
	; CHECK-NEXT: s_mov_b32 s5, s8			; CHECK-NEXT: s_mov_b32 s5, s8
	; CHECK-NEXT: s_mov_b32 s6, s8			; CHECK-NEXT: s_mov_b32 s6, s8
	; CHECK-NEXT: s_mov_b32 s7, s8			; CHECK-NEXT: s_mov_b32 s7, s8
	; CHECK-NEXT: v_writelane_b32 v8, s4, 0			; CHECK-NEXT: v_writelane_b32 v0, s4, 0
	; CHECK-NEXT: v_writelane_b32 v8, s5, 1			; CHECK-NEXT: v_writelane_b32 v0, s5, 1
	; CHECK-NEXT: v_writelane_b32 v8, s6, 2			; CHECK-NEXT: v_writelane_b32 v0, s6, 2
	; CHECK-NEXT: v_writelane_b32 v8, s7, 3			; CHECK-NEXT: v_writelane_b32 v0, s7, 3
	; CHECK-NEXT: s_mov_b32 s6, 0			; CHECK-NEXT: s_mov_b32 s6, 0
	; CHECK-NEXT: s_mov_b32 s4, s6			; CHECK-NEXT: s_mov_b32 s4, s6
	; CHECK-NEXT: s_mov_b32 s5, s6			; CHECK-NEXT: s_mov_b32 s5, s6
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: v_mov_b32_e32 v1, s4
	; CHECK-NEXT: v_mov_b32_e32 v1, s5			; CHECK-NEXT: v_mov_b32_e32 v2, s5
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 s4, exec_lo			; CHECK-NEXT: s_mov_b32 s4, exec_lo
	; CHECK-NEXT: v_writelane_b32 v8, s4, 4			; CHECK-NEXT: v_writelane_b32 v0, s4, 4
				; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; CHECK-NEXT: s_or_saveexec_b32 s21, -1
	; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: buffer_load_dword v12, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v13, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v14, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v15, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v12, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v16, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v13, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v14, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v15, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v16, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readfirstlane_b32 s12, v7			; CHECK-NEXT: v_readfirstlane_b32 s12, v8
	; CHECK-NEXT: v_readfirstlane_b32 s10, v6			; CHECK-NEXT: v_readfirstlane_b32 s10, v7
	; CHECK-NEXT: v_readfirstlane_b32 s9, v5			; CHECK-NEXT: v_readfirstlane_b32 s9, v6
	; CHECK-NEXT: v_readfirstlane_b32 s8, v4			; CHECK-NEXT: v_readfirstlane_b32 s8, v5
	; CHECK-NEXT: v_readfirstlane_b32 s7, v3			; CHECK-NEXT: v_readfirstlane_b32 s7, v4
	; CHECK-NEXT: v_readfirstlane_b32 s6, v2			; CHECK-NEXT: v_readfirstlane_b32 s6, v3
	; CHECK-NEXT: v_readfirstlane_b32 s5, v1			; CHECK-NEXT: v_readfirstlane_b32 s5, v2
	; CHECK-NEXT: v_readfirstlane_b32 s4, v0			; CHECK-NEXT: v_readfirstlane_b32 s4, v1
	; CHECK-NEXT: ; kill: def $sgpr12 killed $sgpr12 def $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19			; CHECK-NEXT: ; kill: def $sgpr12 killed $sgpr12 def $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
	; CHECK-NEXT: s_mov_b32 s13, s10			; CHECK-NEXT: s_mov_b32 s13, s10
	; CHECK-NEXT: s_mov_b32 s14, s9			; CHECK-NEXT: s_mov_b32 s14, s9
	; CHECK-NEXT: s_mov_b32 s15, s8			; CHECK-NEXT: s_mov_b32 s15, s8
	; CHECK-NEXT: s_mov_b32 s16, s7			; CHECK-NEXT: s_mov_b32 s16, s7
	; CHECK-NEXT: s_mov_b32 s17, s6			; CHECK-NEXT: s_mov_b32 s17, s6
	; CHECK-NEXT: s_mov_b32 s18, s5			; CHECK-NEXT: s_mov_b32 s18, s5
	; CHECK-NEXT: s_mov_b32 s19, s4			; CHECK-NEXT: s_mov_b32 s19, s4
	; CHECK-NEXT: v_writelane_b32 v8, s12, 5			; CHECK-NEXT: v_writelane_b32 v0, s12, 5
	; CHECK-NEXT: v_writelane_b32 v8, s13, 6			; CHECK-NEXT: v_writelane_b32 v0, s13, 6
	; CHECK-NEXT: v_writelane_b32 v8, s14, 7			; CHECK-NEXT: v_writelane_b32 v0, s14, 7
	; CHECK-NEXT: v_writelane_b32 v8, s15, 8			; CHECK-NEXT: v_writelane_b32 v0, s15, 8
	; CHECK-NEXT: v_writelane_b32 v8, s16, 9			; CHECK-NEXT: v_writelane_b32 v0, s16, 9
	; CHECK-NEXT: v_writelane_b32 v8, s17, 10			; CHECK-NEXT: v_writelane_b32 v0, s17, 10
	; CHECK-NEXT: v_writelane_b32 v8, s18, 11			; CHECK-NEXT: v_writelane_b32 v0, s18, 11
	; CHECK-NEXT: v_writelane_b32 v8, s19, 12			; CHECK-NEXT: v_writelane_b32 v0, s19, 12
	; CHECK-NEXT: v_mov_b32_e32 v6, v9			; CHECK-NEXT: v_mov_b32_e32 v7, v9
	; CHECK-NEXT: v_mov_b32_e32 v7, v10			; CHECK-NEXT: v_mov_b32_e32 v8, v10
	; CHECK-NEXT: v_mov_b32_e32 v4, v11			; CHECK-NEXT: v_mov_b32_e32 v5, v11
	; CHECK-NEXT: v_mov_b32_e32 v5, v12			; CHECK-NEXT: v_mov_b32_e32 v6, v12
	; CHECK-NEXT: v_mov_b32_e32 v2, v13			; CHECK-NEXT: v_mov_b32_e32 v3, v13
	; CHECK-NEXT: v_mov_b32_e32 v3, v14			; CHECK-NEXT: v_mov_b32_e32 v4, v14
	; CHECK-NEXT: v_mov_b32_e32 v0, v15			; CHECK-NEXT: v_mov_b32_e32 v1, v15
	; CHECK-NEXT: v_mov_b32_e32 v1, v16			; CHECK-NEXT: v_mov_b32_e32 v2, v16
	; CHECK-NEXT: s_mov_b64 s[4:5], s[12:13]			; CHECK-NEXT: s_mov_b64 s[4:5], s[12:13]
	; CHECK-NEXT: s_mov_b64 s[10:11], s[14:15]			; CHECK-NEXT: s_mov_b64 s[10:11], s[14:15]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[16:17]			; CHECK-NEXT: s_mov_b64 s[8:9], s[16:17]
	; CHECK-NEXT: s_mov_b64 s[6:7], s[18:19]			; CHECK-NEXT: s_mov_b64 s[6:7], s[18:19]
	; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[4:5], v[6:7]			; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[4:5], v[7:8]
	; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[10:11], v[4:5]			; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[10:11], v[5:6]
	; CHECK-NEXT: s_and_b32 s4, s4, s5			; CHECK-NEXT: s_and_b32 s4, s4, s5
	; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[8:9], v[2:3]			; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[8:9], v[3:4]
	; CHECK-NEXT: s_and_b32 s4, s4, s5			; CHECK-NEXT: s_and_b32 s4, s4, s5
	; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[6:7], v[0:1]			; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[6:7], v[1:2]
	; CHECK-NEXT: s_and_b32 s4, s4, s5			; CHECK-NEXT: s_and_b32 s4, s4, s5
	; CHECK-NEXT: s_and_saveexec_b32 s4, s4			; CHECK-NEXT: s_and_saveexec_b32 s4, s4
	; CHECK-NEXT: v_writelane_b32 v8, s4, 13			; CHECK-NEXT: v_writelane_b32 v0, s4, 13
				; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: v_readlane_b32 s4, v8, 13			; CHECK-NEXT: s_or_saveexec_b32 s21, -1
	; CHECK-NEXT: v_readlane_b32 s8, v8, 5			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: v_readlane_b32 s9, v8, 6			; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: v_readlane_b32 s10, v8, 7
	; CHECK-NEXT: v_readlane_b32 s11, v8, 8
	; CHECK-NEXT: v_readlane_b32 s12, v8, 9
	; CHECK-NEXT: v_readlane_b32 s13, v8, 10
	; CHECK-NEXT: v_readlane_b32 s14, v8, 11
	; CHECK-NEXT: v_readlane_b32 s15, v8, 12
	; CHECK-NEXT: v_readlane_b32 s16, v8, 0
	; CHECK-NEXT: v_readlane_b32 s17, v8, 1
	; CHECK-NEXT: v_readlane_b32 s18, v8, 2
	; CHECK-NEXT: v_readlane_b32 s19, v8, 3
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_readlane_b32 s4, v2, 13
				; CHECK-NEXT: v_readlane_b32 s8, v2, 5
				; CHECK-NEXT: v_readlane_b32 s9, v2, 6
				; CHECK-NEXT: v_readlane_b32 s10, v2, 7
				; CHECK-NEXT: v_readlane_b32 s11, v2, 8
				; CHECK-NEXT: v_readlane_b32 s12, v2, 9
				; CHECK-NEXT: v_readlane_b32 s13, v2, 10
				; CHECK-NEXT: v_readlane_b32 s14, v2, 11
				; CHECK-NEXT: v_readlane_b32 s15, v2, 12
				; CHECK-NEXT: v_readlane_b32 s16, v2, 0
				; CHECK-NEXT: v_readlane_b32 s17, v2, 1
				; CHECK-NEXT: v_readlane_b32 s18, v2, 2
				; CHECK-NEXT: v_readlane_b32 s19, v2, 3
	; CHECK-NEXT: image_sample v0, v[0:1], s[8:15], s[16:19] dmask:0x1 dim:SQ_RSRC_IMG_2D			; CHECK-NEXT: image_sample v0, v[0:1], s[8:15], s[16:19] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; CHECK-NEXT: s_xor_b32 exec_lo, exec_lo, s4			; CHECK-NEXT: s_xor_b32 exec_lo, exec_lo, s4
	; CHECK-NEXT: s_cbranch_execnz .LBB0_1			; CHECK-NEXT: s_cbranch_execnz .LBB0_1
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: v_readlane_b32 s4, v8, 4			; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_readlane_b32 s4, v0, 4
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: ; %bb.4:			; CHECK-NEXT: ; %bb.4:
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload			; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
	; CHECK-NEXT: ; implicit-def: $sgpr4			; CHECK-NEXT: ; implicit-def: $sgpr4
	; CHECK-NEXT: v_mov_b32_e32 v1, s4			; CHECK-NEXT: v_mov_b32_e32 v1, s4
	; CHECK-NEXT: v_mov_b32_e32 v2, s4			; CHECK-NEXT: v_mov_b32_e32 v2, s4
	; CHECK-NEXT: v_mov_b32_e32 v3, s4			; CHECK-NEXT: v_mov_b32_e32 v3, s4
				; CHECK-NEXT: ; kill: killed $vgpr4
	; CHECK-NEXT: s_xor_saveexec_b32 s4, -1			; CHECK-NEXT: s_xor_saveexec_b32 s4, -1
	; CHECK-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload			; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:84 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%ret = tail call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %vgpr_srd, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)			%ret = tail call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %vgpr_srd, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
	ret <4 x float> %ret			ret <4 x float> %ret
	}			}

	declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0			declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0

	attributes #0 = { nounwind readonly willreturn }			attributes #0 = { nounwind readonly willreturn }

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	define void @sink_null_insert_pt(ptr addrspace(4) %arg0) {			define void @sink_null_insert_pt(ptr addrspace(4) %arg0) {
	; GFX9-LABEL: sink_null_insert_pt:			; GFX9-LABEL: sink_null_insert_pt:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s16, s33			; GFX9-NEXT: s_mov_b32 s16, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[18:19], -1			; GFX9-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[18:19]			; GFX9-NEXT: s_mov_b64 exec, s[18:19]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dword v0, v[0:1], off glc			; GFX9-NEXT: global_load_dword v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_writelane_b32 v40, s16, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s16, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], 0			; GFX9-NEXT: s_swappc_b64 s[30:31], 0
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%load0 = load volatile i32, ptr addrspace(1) null, align 4			%load0 = load volatile i32, ptr addrspace(1) null, align 4
	br label %bb1			br label %bb1

	bb1:			bb1:
	call void null()			call void null()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

	Show All 15 Lines
	define void @parent_func_missing_inputs() #0 {			define void @parent_func_missing_inputs() #0 {
	; FIXEDABI-LABEL: parent_func_missing_inputs:			; FIXEDABI-LABEL: parent_func_missing_inputs:
	; FIXEDABI: ; %bb.0:			; FIXEDABI: ; %bb.0:
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FIXEDABI-NEXT: s_mov_b32 s16, s33			; FIXEDABI-NEXT: s_mov_b32 s16, s33
	; FIXEDABI-NEXT: s_mov_b32 s33, s32			; FIXEDABI-NEXT: s_mov_b32 s33, s32
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[18:19], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[18:19], -1
	; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: s_mov_b64 exec, s[18:19]			; FIXEDABI-NEXT: s_mov_b64 exec, s[18:19]
				; FIXEDABI-NEXT: v_writelane_b32 v40, s16, 2
	; FIXEDABI-NEXT: s_addk_i32 s32, 0x400			; FIXEDABI-NEXT: s_addk_i32 s32, 0x400
	; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0			; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0
	; FIXEDABI-NEXT: v_writelane_b32 v41, s16, 0
	; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1			; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1
	; FIXEDABI-NEXT: s_getpc_b64 s[16:17]			; FIXEDABI-NEXT: s_getpc_b64 s[16:17]
	; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4			; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4
	; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12			; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12
	; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]			; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1			; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1
	; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0			; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0
	; FIXEDABI-NEXT: v_readlane_b32 s4, v41, 0			; FIXEDABI-NEXT: v_readlane_b32 s4, v40, 2
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[6:7], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; FIXEDABI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; FIXEDABI-NEXT: s_mov_b64 exec, s[6:7]			; FIXEDABI-NEXT: s_mov_b64 exec, s[6:7]
	; FIXEDABI-NEXT: s_addk_i32 s32, 0xfc00			; FIXEDABI-NEXT: s_addk_i32 s32, 0xfc00
	; FIXEDABI-NEXT: s_mov_b32 s33, s4			; FIXEDABI-NEXT: s_mov_b32 s33, s4
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0)
	; FIXEDABI-NEXT: s_setpc_b64 s[30:31]			; FIXEDABI-NEXT: s_setpc_b64 s[30:31]
	call void @requires_all_inputs()			call void @requires_all_inputs()
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

	Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	; GCN-NEXT: .sgpr_count: 0x25{{$}}			; GCN-NEXT: .sgpr_count: 0x25{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x3{{$}}			; GCN-NEXT: .vgpr_count: 0x3{{$}}
	; GCN-NEXT: no_stack_extern_call:			; GCN-NEXT: no_stack_extern_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x2c{{$}}			; GCN-NEXT: .vgpr_count: 0x2b{{$}}
	; GCN-NEXT: no_stack_extern_call_many_args:			; GCN-NEXT: no_stack_extern_call_many_args:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}
	; GCN-NEXT: .vgpr_count: 0x2c{{$}}			; GCN-NEXT: .vgpr_count: 0x2b{{$}}
	; GCN-NEXT: no_stack_indirect_call:			; GCN-NEXT: no_stack_indirect_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x2c{{$}}			; GCN-NEXT: .vgpr_count: 0x2b{{$}}
	; GCN-NEXT: simple_lds:			; GCN-NEXT: simple_lds:
	; GCN-NEXT: .lds_size: 0x100{{$}}			; GCN-NEXT: .lds_size: 0x100{{$}}
	; GCN-NEXT: .sgpr_count: 0x20{{$}}			; GCN-NEXT: .sgpr_count: 0x20{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: .vgpr_count: 0x1{{$}}			; GCN-NEXT: .vgpr_count: 0x1{{$}}
	; GCN-NEXT: simple_lds_recurse:			; GCN-NEXT: simple_lds_recurse:
	; GCN-NEXT: .lds_size: 0x100{{$}}			; GCN-NEXT: .lds_size: 0x100{{$}}
	; GCN-NEXT: .sgpr_count: 0x28{{$}}			; GCN-NEXT: .sgpr_count: 0x28{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x2a{{$}}			; GCN-NEXT: .vgpr_count: 0x29{{$}}
	; GCN-NEXT: simple_stack:			; GCN-NEXT: simple_stack:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x21{{$}}			; GCN-NEXT: .sgpr_count: 0x21{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}
	; GCN-NEXT: .vgpr_count: 0x2{{$}}			; GCN-NEXT: .vgpr_count: 0x2{{$}}
	; GCN-NEXT: simple_stack_call:			; GCN-NEXT: simple_stack_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x25{{$}}			; GCN-NEXT: .sgpr_count: 0x25{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x4{{$}}			; GCN-NEXT: .vgpr_count: 0x4{{$}}
	; GCN-NEXT: simple_stack_extern_call:			; GCN-NEXT: simple_stack_extern_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x2c{{$}}			; GCN-NEXT: .vgpr_count: 0x2b{{$}}
	; GCN-NEXT: simple_stack_indirect_call:			; GCN-NEXT: simple_stack_indirect_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x30{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x2c{{$}}			; GCN-NEXT: .vgpr_count: 0x2b{{$}}
	; GCN-NEXT: simple_stack_recurse:			; GCN-NEXT: simple_stack_recurse:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x28{{$}}			; GCN-NEXT: .sgpr_count: 0x28{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2a{{$}}
	; GCN-NEXT: ...			; GCN-NEXT: ...

llvm/test/CodeGen/AMDGPU/bf16.ll

	Show First 20 Lines • Show All 1,365 Lines • ▼ Show 20 Lines
	entry:			entry:
	ret <16 x bfloat> %in			ret <16 x bfloat> %in
	}			}

	define void @test_call(bfloat %in, ptr addrspace(5) %out) {			define void @test_call(bfloat %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call:			; GCN-LABEL: test_call:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s10, s33			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v2, s30, 0			; GCN-NEXT: v_writelane_b32 v2, s30, 0
	; GCN-NEXT: v_writelane_b32 v2, s31, 1			; GCN-NEXT: v_writelane_b32 v2, s31, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GCN-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v2, 1			; GCN-NEXT: v_readlane_b32 s31, v2, 1
	; GCN-NEXT: v_readlane_b32 s30, v2, 0			; GCN-NEXT: v_readlane_b32 s30, v2, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call:			; GFX7-LABEL: test_call:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s10, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX7-NEXT: v_writelane_b32 v2, s30, 0			; GFX7-NEXT: v_writelane_b32 v2, s30, 0
	; GFX7-NEXT: v_writelane_b32 v2, s31, 1			; GFX7-NEXT: v_writelane_b32 v2, s31, 1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX7-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v2, 1			; GFX7-NEXT: v_readlane_b32 s31, v2, 1
	; GFX7-NEXT: v_readlane_b32 s30, v2, 0			; GFX7-NEXT: v_readlane_b32 s30, v2, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: s_mov_b32 s33, s10			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call:			; GFX8-LABEL: test_call:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s8, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v2, s30, 0			; GFX8-NEXT: v_writelane_b32 v2, s30, 0
	; GFX8-NEXT: v_writelane_b32 v2, s31, 1			; GFX8-NEXT: v_writelane_b32 v2, s31, 1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX8-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX8-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX8-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v2, 1			; GFX8-NEXT: v_readlane_b32 s31, v2, 1
	; GFX8-NEXT: v_readlane_b32 s30, v2, 0			; GFX8-NEXT: v_readlane_b32 s30, v2, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: s_mov_b32 s33, s8			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call:			; GFX9-LABEL: test_call:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s8, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s8			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call:			; GFX10-LABEL: test_call:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s7, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s7			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call bfloat @test_arg_store(bfloat %in)			%result = call bfloat @test_arg_store(bfloat %in)
	store volatile bfloat %result, ptr addrspace(5) %out			store volatile bfloat %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v2bf16(<2 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v2bf16(<2 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v2bf16:			; GCN-LABEL: test_call_v2bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s10, s33			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v3, s30, 0			; GCN-NEXT: v_writelane_b32 v3, s30, 0
	; GCN-NEXT: v_writelane_b32 v3, s31, 1			; GCN-NEXT: v_writelane_b32 v3, s31, 1
	Show All 11 Lines
	; GCN-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v3, 1			; GCN-NEXT: v_readlane_b32 s31, v3, 1
	; GCN-NEXT: v_readlane_b32 s30, v3, 0			; GCN-NEXT: v_readlane_b32 s30, v3, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v2bf16:			; GFX7-LABEL: test_call_v2bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s10, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 10 Lines
	; GFX7-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v3, 1			; GFX7-NEXT: v_readlane_b32 s31, v3, 1
	; GFX7-NEXT: v_readlane_b32 s30, v3, 0			; GFX7-NEXT: v_readlane_b32 s30, v3, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: s_mov_b32 s33, s10			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v2bf16:			; GFX8-LABEL: test_call_v2bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s8, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v2, s30, 0			; GFX8-NEXT: v_writelane_b32 v2, s30, 0
	; GFX8-NEXT: v_writelane_b32 v2, s31, 1			; GFX8-NEXT: v_writelane_b32 v2, s31, 1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX8-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v2, 1			; GFX8-NEXT: v_readlane_b32 s31, v2, 1
	; GFX8-NEXT: v_readlane_b32 s30, v2, 0			; GFX8-NEXT: v_readlane_b32 s30, v2, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: s_mov_b32 s33, s8			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v2bf16:			; GFX9-LABEL: test_call_v2bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s8, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s8			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v2bf16:			; GFX10-LABEL: test_call_v2bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s7, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s7			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <2 x bfloat> @test_arg_store_v2bf16(<2 x bfloat> %in)			%result = call <2 x bfloat> @test_arg_store_v2bf16(<2 x bfloat> %in)
	store volatile <2 x bfloat> %result, ptr addrspace(5) %out			store volatile <2 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v3bf16(<3 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v3bf16(<3 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v3bf16:			; GCN-LABEL: test_call_v3bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s10, s33			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v4, s30, 0			; GCN-NEXT: v_writelane_b32 v4, s30, 0
	; GCN-NEXT: v_writelane_b32 v4, s31, 1			; GCN-NEXT: v_writelane_b32 v4, s31, 1
	Show All 12 Lines
	; GCN-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen			; GCN-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v4, 1			; GCN-NEXT: v_readlane_b32 s31, v4, 1
	; GCN-NEXT: v_readlane_b32 s30, v4, 0			; GCN-NEXT: v_readlane_b32 s30, v4, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v3bf16:			; GFX7-LABEL: test_call_v3bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s10, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v4, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v4, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 11 Lines
	; GFX7-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v4, 1			; GFX7-NEXT: v_readlane_b32 s31, v4, 1
	; GFX7-NEXT: v_readlane_b32 s30, v4, 0			; GFX7-NEXT: v_readlane_b32 s30, v4, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: s_mov_b32 s33, s10			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v3bf16:			; GFX8-LABEL: test_call_v3bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s8, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 9 Lines
	; GFX8-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v3, 1			; GFX8-NEXT: v_readlane_b32 s31, v3, 1
	; GFX8-NEXT: v_readlane_b32 s30, v3, 0			; GFX8-NEXT: v_readlane_b32 s30, v3, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: s_mov_b32 s33, s8			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v3bf16:			; GFX9-LABEL: test_call_v3bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s8, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0			; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v4			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v4
	Show All 11 Lines
	; GFX9-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s8			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v3bf16:			; GFX10-LABEL: test_call_v3bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s7, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v3, 1			; GFX10-NEXT: v_readlane_b32 s31, v3, 1
	; GFX10-NEXT: v_readlane_b32 s30, v3, 0			; GFX10-NEXT: v_readlane_b32 s30, v3, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s7			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <3 x bfloat> @test_arg_store_v2bf16(<3 x bfloat> %in)			%result = call <3 x bfloat> @test_arg_store_v2bf16(<3 x bfloat> %in)
	store volatile <3 x bfloat> %result, ptr addrspace(5) %out			store volatile <3 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v4bf16(<4 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v4bf16(<4 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v4bf16:			; GCN-LABEL: test_call_v4bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s10, s33			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v5, s30, 0			; GCN-NEXT: v_writelane_b32 v5, s30, 0
	; GCN-NEXT: v_writelane_b32 v5, s31, 1			; GCN-NEXT: v_writelane_b32 v5, s31, 1
	Show All 19 Lines
	; GCN-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v5, 1			; GCN-NEXT: v_readlane_b32 s31, v5, 1
	; GCN-NEXT: v_readlane_b32 s30, v5, 0			; GCN-NEXT: v_readlane_b32 s30, v5, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v4bf16:			; GFX7-LABEL: test_call_v4bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s10, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 18 Lines
	; GFX7-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v5, 1			; GFX7-NEXT: v_readlane_b32 s31, v5, 1
	; GFX7-NEXT: v_readlane_b32 s30, v5, 0			; GFX7-NEXT: v_readlane_b32 s30, v5, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: s_mov_b32 s33, s10			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v4bf16:			; GFX8-LABEL: test_call_v4bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s8, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 16 Lines
	; GFX8-NEXT: buffer_store_short v4, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v4, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v3, 1			; GFX8-NEXT: v_readlane_b32 s31, v3, 1
	; GFX8-NEXT: v_readlane_b32 s30, v3, 0			; GFX8-NEXT: v_readlane_b32 s30, v3, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: s_mov_b32 s33, s8			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v4bf16:			; GFX9-LABEL: test_call_v4bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s8, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 11 Lines
	; GFX9-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s8			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v4bf16:			; GFX10-LABEL: test_call_v4bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s7, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	Show All 13 Lines
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v3, 1			; GFX10-NEXT: v_readlane_b32 s31, v3, 1
	; GFX10-NEXT: v_readlane_b32 s30, v3, 0			; GFX10-NEXT: v_readlane_b32 s30, v3, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s7			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <4 x bfloat> @test_arg_store_v2bf16(<4 x bfloat> %in)			%result = call <4 x bfloat> @test_arg_store_v2bf16(<4 x bfloat> %in)
	store volatile <4 x bfloat> %result, ptr addrspace(5) %out			store volatile <4 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v8bf16(<8 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v8bf16(<8 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v8bf16:			; GCN-LABEL: test_call_v8bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s10, s33			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v9, s30, 0			; GCN-NEXT: v_writelane_b32 v9, s30, 0
	; GCN-NEXT: v_writelane_b32 v9, s31, 1			; GCN-NEXT: v_writelane_b32 v9, s31, 1
	Show All 35 Lines
	; GCN-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v9, 1			; GCN-NEXT: v_readlane_b32 s31, v9, 1
	; GCN-NEXT: v_readlane_b32 s30, v9, 0			; GCN-NEXT: v_readlane_b32 s30, v9, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v8bf16:			; GFX7-LABEL: test_call_v8bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s10, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 34 Lines
	; GFX7-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v9, 1			; GFX7-NEXT: v_readlane_b32 s31, v9, 1
	; GFX7-NEXT: v_readlane_b32 s30, v9, 0			; GFX7-NEXT: v_readlane_b32 s30, v9, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: s_mov_b32 s33, s10			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v8bf16:			; GFX8-LABEL: test_call_v8bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s8, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 30 Lines
	; GFX8-NEXT: buffer_store_short v6, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v6, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v5, 1			; GFX8-NEXT: v_readlane_b32 s31, v5, 1
	; GFX8-NEXT: v_readlane_b32 s30, v5, 0			; GFX8-NEXT: v_readlane_b32 s30, v5, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: s_mov_b32 s33, s8			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v8bf16:			; GFX9-LABEL: test_call_v8bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s8, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 19 Lines
	; GFX9-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v5, 1			; GFX9-NEXT: v_readlane_b32 s31, v5, 1
	; GFX9-NEXT: v_readlane_b32 s30, v5, 0			; GFX9-NEXT: v_readlane_b32 s30, v5, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s8			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v8bf16:			; GFX10-LABEL: test_call_v8bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s7, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v5, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	Show All 21 Lines
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v5, 1			; GFX10-NEXT: v_readlane_b32 s31, v5, 1
	; GFX10-NEXT: v_readlane_b32 s30, v5, 0			; GFX10-NEXT: v_readlane_b32 s30, v5, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s7			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <8 x bfloat> @test_arg_store_v2bf16(<8 x bfloat> %in)			%result = call <8 x bfloat> @test_arg_store_v2bf16(<8 x bfloat> %in)
	store volatile <8 x bfloat> %result, ptr addrspace(5) %out			store volatile <8 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v16bf16(<16 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v16bf16(<16 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v16bf16:			; GCN-LABEL: test_call_v16bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s10, s33			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v17, s30, 0			; GCN-NEXT: v_writelane_b32 v17, s30, 0
	; GCN-NEXT: v_writelane_b32 v17, s31, 1			; GCN-NEXT: v_writelane_b32 v17, s31, 1
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v17, 1			; GCN-NEXT: v_readlane_b32 s31, v17, 1
	; GCN-NEXT: v_readlane_b32 s30, v17, 0			; GCN-NEXT: v_readlane_b32 s30, v17, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v16bf16:			; GFX7-LABEL: test_call_v16bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s10, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v17, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v17, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v17, 1			; GFX7-NEXT: v_readlane_b32 s31, v17, 1
	; GFX7-NEXT: v_readlane_b32 s30, v17, 0			; GFX7-NEXT: v_readlane_b32 s30, v17, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: s_mov_b32 s33, s10			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v16bf16:			; GFX8-LABEL: test_call_v16bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s8, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: buffer_store_short v10, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v10, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v9, 1			; GFX8-NEXT: v_readlane_b32 s31, v9, 1
	; GFX8-NEXT: v_readlane_b32 s30, v9, 0			; GFX8-NEXT: v_readlane_b32 s30, v9, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: s_mov_b32 s33, s8			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v16bf16:			; GFX9-LABEL: test_call_v16bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s8, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	Show All 35 Lines
	; GFX9-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v9, 1			; GFX9-NEXT: v_readlane_b32 s31, v9, 1
	; GFX9-NEXT: v_readlane_b32 s30, v9, 0			; GFX9-NEXT: v_readlane_b32 s30, v9, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s8			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v16bf16:			; GFX10-LABEL: test_call_v16bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s7, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v9, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	Show All 37 Lines
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v9, 1			; GFX10-NEXT: v_readlane_b32 s31, v9, 1
	; GFX10-NEXT: v_readlane_b32 s30, v9, 0			; GFX10-NEXT: v_readlane_b32 s30, v9, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s7			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <16 x bfloat> @test_arg_store_v2bf16(<16 x bfloat> %in)			%result = call <16 x bfloat> @test_arg_store_v2bf16(<16 x bfloat> %in)
	store volatile <16 x bfloat> %result, ptr addrspace(5) %out			store volatile <16 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 406 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s \| FileCheck -check-prefixes=ALL,GFX908 %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s \| FileCheck -check-prefixes=ALL,GFX908 %s
	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s \| FileCheck -check-prefixes=ALL,GFX90A %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s \| FileCheck -check-prefixes=ALL,GFX90A %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias = hidden alias void (), ptr @aliasee_default			@alias = hidden alias void (), ptr @aliasee_default

	; ALL-LABEL: {{^}}kernel:			; ALL-LABEL: {{^}}kernel:
	; GFX908: .amdhsa_next_free_vgpr 41			; GFX908: .amdhsa_next_free_vgpr 32
	; GFX908-NEXT: .amdhsa_next_free_sgpr 33			; GFX908-NEXT: .amdhsa_next_free_sgpr 36

	; GFX90A: .amdhsa_next_free_vgpr 71			; GFX90A: .amdhsa_next_free_vgpr 65
	; GFX90A-NEXT: .amdhsa_next_free_sgpr 33			; GFX90A-NEXT: .amdhsa_next_free_sgpr 36
	; GFX90A-NEXT: .amdhsa_accum_offset 44			; GFX90A-NEXT: .amdhsa_accum_offset 32
	define amdgpu_kernel void @kernel() #0 {			define amdgpu_kernel void @kernel() #0 {
	bb:			bb:
	call void @alias() #2			call void @alias() #2
	ret void			ret void
	}			}

	define internal void @aliasee_default() #1 {			define internal void @aliasee_default() #1 {
	bb:			bb:
	call void asm sideeffect "; clobber a26 ", "~{a26}"()			call void asm sideeffect "; clobber a26 ", "~{a26}"()
	ret void			ret void
	}			}

	attributes #0 = { noinline norecurse nounwind optnone }			attributes #0 = { noinline norecurse nounwind optnone }
	attributes #1 = { noinline norecurse nounwind readnone willreturn }			attributes #1 = { noinline norecurse nounwind readnone willreturn }
	attributes #2 = { nounwind readnone willreturn }			attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias0 = hidden alias void (), ptr @aliasee_default_vgpr64_sgpr102			@alias0 = hidden alias void (), ptr @aliasee_default_vgpr64_sgpr102

	; CHECK-LABEL: {{^}}kernel0:			; CHECK-LABEL: {{^}}kernel0:
	; CHECK: .amdhsa_next_free_vgpr 53			; CHECK: .amdhsa_next_free_vgpr 53
	; CHECK-NEXT: .amdhsa_next_free_sgpr 33			; CHECK-NEXT: .amdhsa_next_free_sgpr 36
	define amdgpu_kernel void @kernel0() #0 {			define amdgpu_kernel void @kernel0() #0 {
	bb:			bb:
	call void @alias0() #2			call void @alias0() #2
	ret void			ret void
	}			}

	define internal void @aliasee_default_vgpr64_sgpr102() #1 {			define internal void @aliasee_default_vgpr64_sgpr102() #1 {
	bb:			bb:
	call void asm sideeffect "; clobber v52 ", "~{v52}"()			call void asm sideeffect "; clobber v52 ", "~{v52}"()
	ret void			ret void
	}			}

	attributes #0 = { noinline norecurse nounwind optnone }			attributes #0 = { noinline norecurse nounwind optnone }
	attributes #1 = { noinline norecurse nounwind readnone willreturn }			attributes #1 = { noinline norecurse nounwind readnone willreturn }
	attributes #2 = { nounwind readnone willreturn }			attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias1 = hidden alias void (), ptr @aliasee_vgpr32_sgpr76			@alias1 = hidden alias void (), ptr @aliasee_vgpr32_sgpr76

	; The parent kernel has a higher VGPR usage than the possible callees.			; The parent kernel has a higher VGPR usage than the possible callees.

	; CHECK-LABEL: {{^}}kernel1:			; CHECK-LABEL: {{^}}kernel1:
	; CHECK: .amdhsa_next_free_vgpr 42			; CHECK: .amdhsa_next_free_vgpr 41
	; CHECK-NEXT: .amdhsa_next_free_sgpr 33			; CHECK-NEXT: .amdhsa_next_free_sgpr 36
	define amdgpu_kernel void @kernel1() #0 {			define amdgpu_kernel void @kernel1() #0 {
	bb:			bb:
	call void asm sideeffect "; clobber v40 ", "~{v40}"()			call void asm sideeffect "; clobber v40 ", "~{v40}"()
	call void @alias1() #2			call void @alias1() #2
	ret void			ret void
	}			}

	define internal void @aliasee_vgpr32_sgpr76() #1 {			define internal void @aliasee_vgpr32_sgpr76() #1 {
	bb:			bb:
	call void asm sideeffect "; clobber v26 ", "~{v26}"()			call void asm sideeffect "; clobber v26 ", "~{v26}"()
	ret void			ret void
	}			}

	attributes #0 = { noinline norecurse nounwind optnone }			attributes #0 = { noinline norecurse nounwind optnone }
	attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-waves-per-eu"="8,10" }			attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-waves-per-eu"="8,10" }
	attributes #2 = { nounwind readnone willreturn }			attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias2 = hidden alias void (), ptr @aliasee_vgpr64_sgpr102			@alias2 = hidden alias void (), ptr @aliasee_vgpr64_sgpr102

	; CHECK-LABEL: {{^}}kernel2:			; CHECK-LABEL: {{^}}kernel2:
	; CHECK: .amdhsa_next_free_vgpr 53			; CHECK: .amdhsa_next_free_vgpr 53
	; CHECK-NEXT: .amdhsa_next_free_sgpr 33			; CHECK-NEXT: .amdhsa_next_free_sgpr 36
	define amdgpu_kernel void @kernel2() #0 {			define amdgpu_kernel void @kernel2() #0 {
	bb:			bb:
	call void @alias2() #2			call void @alias2() #2
	ret void			ret void
	}			}

	define internal void @aliasee_vgpr64_sgpr102() #1 {			define internal void @aliasee_vgpr64_sgpr102() #1 {
	bb:			bb:
	call void asm sideeffect "; clobber v52 ", "~{v52}"()			call void asm sideeffect "; clobber v52 ", "~{v52}"()
	ret void			ret void
	}			}

	attributes #0 = { noinline norecurse nounwind optnone }			attributes #0 = { noinline norecurse nounwind optnone }
	attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-waves-per-eu"="4,10" }			attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-waves-per-eu"="4,10" }
	attributes #2 = { nounwind readnone willreturn }			attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias3 = hidden alias void (), ptr @aliasee_vgpr256_sgpr102			@alias3 = hidden alias void (), ptr @aliasee_vgpr256_sgpr102

	; CHECK-LABEL: {{^}}kernel3:			; CHECK-LABEL: {{^}}kernel3:
	; CHECK: .amdhsa_next_free_vgpr 253			; CHECK: .amdhsa_next_free_vgpr 253
	; CHECK-NEXT: .amdhsa_next_free_sgpr 33			; CHECK-NEXT: .amdhsa_next_free_sgpr 36
	define amdgpu_kernel void @kernel3() #0 {			define amdgpu_kernel void @kernel3() #0 {
	bb:			bb:
	call void @alias3() #2			call void @alias3() #2
	ret void			ret void
	}			}

	define internal void @aliasee_vgpr256_sgpr102() #1 {			define internal void @aliasee_vgpr256_sgpr102() #1 {
	bb:			bb:
	call void asm sideeffect "; clobber v252 ", "~{v252}"()			call void asm sideeffect "; clobber v252 ", "~{v252}"()
	ret void			ret void
	}			}

	attributes #0 = { noinline norecurse nounwind optnone }			attributes #0 = { noinline norecurse nounwind optnone }
	attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-per-eu"="1,1" }			attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-per-eu"="1,1" }
	attributes #2 = { nounwind readnone willreturn }			attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,815 Lines • ▼ Show 20 Lines
	define void @stack_12xv3i32() #0 {			define void @stack_12xv3i32() #0 {
	; VI-LABEL: stack_12xv3i32:			; VI-LABEL: stack_12xv3i32:
	; VI: ; %bb.0: ; %entry			; VI: ; %bb.0: ; %entry
	; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s4, s33			; VI-NEXT: s_mov_b32 s4, s33
	; VI-NEXT: s_mov_b32 s33, s32			; VI-NEXT: s_mov_b32 s33, s32
	; VI-NEXT: s_or_saveexec_b64 s[8:9], -1			; VI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; VI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; VI-NEXT: s_mov_b64 exec, s[8:9]			; VI-NEXT: s_mov_b64 exec, s[8:9]
	; VI-NEXT: s_addk_i32 s32, 0x400			; VI-NEXT: s_addk_i32 s32, 0x400
	; VI-NEXT: v_mov_b32_e32 v0, 11			; VI-NEXT: v_mov_b32_e32 v0, 11
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; VI-NEXT: v_mov_b32_e32 v0, 12			; VI-NEXT: v_mov_b32_e32 v0, 12
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; VI-NEXT: v_mov_b32_e32 v0, 13			; VI-NEXT: v_mov_b32_e32 v0, 13
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; VI-NEXT: v_mov_b32_e32 v0, 14			; VI-NEXT: v_mov_b32_e32 v0, 14
				; VI-NEXT: v_writelane_b32 v40, s4, 2
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; VI-NEXT: v_mov_b32_e32 v0, 15			; VI-NEXT: v_mov_b32_e32 v0, 15
	; VI-NEXT: v_writelane_b32 v40, s30, 0			; VI-NEXT: v_writelane_b32 v40, s30, 0
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0			; VI-NEXT: v_mov_b32_e32 v1, 0
	; VI-NEXT: v_mov_b32_e32 v2, 0			; VI-NEXT: v_mov_b32_e32 v2, 0
	; VI-NEXT: v_mov_b32_e32 v3, 1			; VI-NEXT: v_mov_b32_e32 v3, 1
	Show All 19 Lines
	; VI-NEXT: v_mov_b32_e32 v23, 7			; VI-NEXT: v_mov_b32_e32 v23, 7
	; VI-NEXT: v_mov_b32_e32 v24, 8			; VI-NEXT: v_mov_b32_e32 v24, 8
	; VI-NEXT: v_mov_b32_e32 v25, 8			; VI-NEXT: v_mov_b32_e32 v25, 8
	; VI-NEXT: v_mov_b32_e32 v26, 8			; VI-NEXT: v_mov_b32_e32 v26, 8
	; VI-NEXT: v_mov_b32_e32 v27, 9			; VI-NEXT: v_mov_b32_e32 v27, 9
	; VI-NEXT: v_mov_b32_e32 v28, 9			; VI-NEXT: v_mov_b32_e32 v28, 9
	; VI-NEXT: v_mov_b32_e32 v29, 9			; VI-NEXT: v_mov_b32_e32 v29, 9
	; VI-NEXT: v_mov_b32_e32 v30, 10			; VI-NEXT: v_mov_b32_e32 v30, 10
	; VI-NEXT: v_writelane_b32 v41, s4, 0
	; VI-NEXT: v_writelane_b32 v40, s31, 1			; VI-NEXT: v_writelane_b32 v40, s31, 1
	; VI-NEXT: s_getpc_b64 s[4:5]			; VI-NEXT: s_getpc_b64 s[4:5]
	; VI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4			; VI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v41, 0			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_addk_i32 s32, 0xfc00
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_12xv3i32:			; CI-LABEL: stack_12xv3i32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	; CI-NEXT: s_mov_b32 s33, s32			; CI-NEXT: s_mov_b32 s33, s32
	; CI-NEXT: s_or_saveexec_b64 s[8:9], -1			; CI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CI-NEXT: s_mov_b64 exec, s[8:9]			; CI-NEXT: s_mov_b64 exec, s[8:9]
	; CI-NEXT: s_addk_i32 s32, 0x400			; CI-NEXT: s_addk_i32 s32, 0x400
	; CI-NEXT: v_mov_b32_e32 v0, 11			; CI-NEXT: v_mov_b32_e32 v0, 11
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; CI-NEXT: v_mov_b32_e32 v0, 12			; CI-NEXT: v_mov_b32_e32 v0, 12
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; CI-NEXT: v_mov_b32_e32 v0, 13			; CI-NEXT: v_mov_b32_e32 v0, 13
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; CI-NEXT: v_mov_b32_e32 v0, 14			; CI-NEXT: v_mov_b32_e32 v0, 14
				; CI-NEXT: v_writelane_b32 v40, s4, 2
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; CI-NEXT: v_mov_b32_e32 v0, 15			; CI-NEXT: v_mov_b32_e32 v0, 15
	; CI-NEXT: v_writelane_b32 v40, s30, 0			; CI-NEXT: v_writelane_b32 v40, s30, 0
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; CI-NEXT: v_mov_b32_e32 v0, 0			; CI-NEXT: v_mov_b32_e32 v0, 0
	; CI-NEXT: v_mov_b32_e32 v1, 0			; CI-NEXT: v_mov_b32_e32 v1, 0
	; CI-NEXT: v_mov_b32_e32 v2, 0			; CI-NEXT: v_mov_b32_e32 v2, 0
	; CI-NEXT: v_mov_b32_e32 v3, 1			; CI-NEXT: v_mov_b32_e32 v3, 1
	Show All 19 Lines
	; CI-NEXT: v_mov_b32_e32 v23, 7			; CI-NEXT: v_mov_b32_e32 v23, 7
	; CI-NEXT: v_mov_b32_e32 v24, 8			; CI-NEXT: v_mov_b32_e32 v24, 8
	; CI-NEXT: v_mov_b32_e32 v25, 8			; CI-NEXT: v_mov_b32_e32 v25, 8
	; CI-NEXT: v_mov_b32_e32 v26, 8			; CI-NEXT: v_mov_b32_e32 v26, 8
	; CI-NEXT: v_mov_b32_e32 v27, 9			; CI-NEXT: v_mov_b32_e32 v27, 9
	; CI-NEXT: v_mov_b32_e32 v28, 9			; CI-NEXT: v_mov_b32_e32 v28, 9
	; CI-NEXT: v_mov_b32_e32 v29, 9			; CI-NEXT: v_mov_b32_e32 v29, 9
	; CI-NEXT: v_mov_b32_e32 v30, 10			; CI-NEXT: v_mov_b32_e32 v30, 10
	; CI-NEXT: v_writelane_b32 v41, s4, 0
	; CI-NEXT: v_writelane_b32 v40, s31, 1			; CI-NEXT: v_writelane_b32 v40, s31, 1
	; CI-NEXT: s_getpc_b64 s[4:5]			; CI-NEXT: s_getpc_b64 s[4:5]
	; CI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4			; CI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v41, 0			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_addk_i32 s32, 0xfc00
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_12xv3i32:			; GFX9-LABEL: stack_12xv3i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1			; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[8:9]			; GFX9-NEXT: s_mov_b64 exec, s[8:9]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 11			; GFX9-NEXT: v_mov_b32_e32 v0, 11
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 12			; GFX9-NEXT: v_mov_b32_e32 v0, 12
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 14			; GFX9-NEXT: v_mov_b32_e32 v0, 14
				; GFX9-NEXT: v_writelane_b32 v40, s4, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 1			; GFX9-NEXT: v_mov_b32_e32 v3, 1
	Show All 19 Lines
	; GFX9-NEXT: v_mov_b32_e32 v23, 7			; GFX9-NEXT: v_mov_b32_e32 v23, 7
	; GFX9-NEXT: v_mov_b32_e32 v24, 8			; GFX9-NEXT: v_mov_b32_e32 v24, 8
	; GFX9-NEXT: v_mov_b32_e32 v25, 8			; GFX9-NEXT: v_mov_b32_e32 v25, 8
	; GFX9-NEXT: v_mov_b32_e32 v26, 8			; GFX9-NEXT: v_mov_b32_e32 v26, 8
	; GFX9-NEXT: v_mov_b32_e32 v27, 9			; GFX9-NEXT: v_mov_b32_e32 v27, 9
	; GFX9-NEXT: v_mov_b32_e32 v28, 9			; GFX9-NEXT: v_mov_b32_e32 v28, 9
	; GFX9-NEXT: v_mov_b32_e32 v29, 9			; GFX9-NEXT: v_mov_b32_e32 v29, 9
	; GFX9-NEXT: v_mov_b32_e32 v30, 10			; GFX9-NEXT: v_mov_b32_e32 v30, 10
	; GFX9-NEXT: v_writelane_b32 v41, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_12xv3i32:			; GFX11-LABEL: stack_12xv3i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 11 :: v_dual_mov_b32 v1, 12			; GFX11-NEXT: v_dual_mov_b32 v0, 11 :: v_dual_mov_b32 v1, 12
	; GFX11-NEXT: v_dual_mov_b32 v2, 13 :: v_dual_mov_b32 v3, 14			; GFX11-NEXT: v_dual_mov_b32 v2, 13 :: v_dual_mov_b32 v3, 14
	; GFX11-NEXT: v_mov_b32_e32 v4, 15			; GFX11-NEXT: v_mov_b32_e32 v4, 15
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: scratch_store_b32 off, v4, s0			; GFX11-NEXT: scratch_store_b32 off, v4, s0
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, 0			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, 0
	; GFX11-NEXT: v_dual_mov_b32 v3, 1 :: v_dual_mov_b32 v2, 0			; GFX11-NEXT: v_dual_mov_b32 v3, 1 :: v_dual_mov_b32 v2, 0
	; GFX11-NEXT: v_dual_mov_b32 v5, 1 :: v_dual_mov_b32 v4, 1			; GFX11-NEXT: v_dual_mov_b32 v5, 1 :: v_dual_mov_b32 v4, 1
	; GFX11-NEXT: v_dual_mov_b32 v7, 2 :: v_dual_mov_b32 v6, 2			; GFX11-NEXT: v_dual_mov_b32 v7, 2 :: v_dual_mov_b32 v6, 2
	; GFX11-NEXT: v_dual_mov_b32 v9, 3 :: v_dual_mov_b32 v8, 2			; GFX11-NEXT: v_dual_mov_b32 v9, 3 :: v_dual_mov_b32 v8, 2
	; GFX11-NEXT: v_dual_mov_b32 v11, 3 :: v_dual_mov_b32 v10, 3			; GFX11-NEXT: v_dual_mov_b32 v11, 3 :: v_dual_mov_b32 v10, 3
	Show All 10 Lines
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_12xv3i32:			; HSA-LABEL: stack_12xv3i32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	; HSA-NEXT: s_mov_b32 s33, s32			; HSA-NEXT: s_mov_b32 s33, s32
	; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1			; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1
	; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; HSA-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; HSA-NEXT: s_mov_b64 exec, s[8:9]			; HSA-NEXT: s_mov_b64 exec, s[8:9]
	; HSA-NEXT: s_addk_i32 s32, 0x400			; HSA-NEXT: s_addk_i32 s32, 0x400
	; HSA-NEXT: v_mov_b32_e32 v0, 11			; HSA-NEXT: v_mov_b32_e32 v0, 11
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; HSA-NEXT: v_mov_b32_e32 v0, 12			; HSA-NEXT: v_mov_b32_e32 v0, 12
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; HSA-NEXT: v_mov_b32_e32 v0, 13			; HSA-NEXT: v_mov_b32_e32 v0, 13
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; HSA-NEXT: v_mov_b32_e32 v0, 14			; HSA-NEXT: v_mov_b32_e32 v0, 14
				; HSA-NEXT: v_writelane_b32 v40, s4, 2
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; HSA-NEXT: v_mov_b32_e32 v0, 15			; HSA-NEXT: v_mov_b32_e32 v0, 15
	; HSA-NEXT: v_writelane_b32 v40, s30, 0			; HSA-NEXT: v_writelane_b32 v40, s30, 0
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; HSA-NEXT: v_mov_b32_e32 v0, 0			; HSA-NEXT: v_mov_b32_e32 v0, 0
	; HSA-NEXT: v_mov_b32_e32 v1, 0			; HSA-NEXT: v_mov_b32_e32 v1, 0
	; HSA-NEXT: v_mov_b32_e32 v2, 0			; HSA-NEXT: v_mov_b32_e32 v2, 0
	; HSA-NEXT: v_mov_b32_e32 v3, 1			; HSA-NEXT: v_mov_b32_e32 v3, 1
	Show All 19 Lines
	; HSA-NEXT: v_mov_b32_e32 v23, 7			; HSA-NEXT: v_mov_b32_e32 v23, 7
	; HSA-NEXT: v_mov_b32_e32 v24, 8			; HSA-NEXT: v_mov_b32_e32 v24, 8
	; HSA-NEXT: v_mov_b32_e32 v25, 8			; HSA-NEXT: v_mov_b32_e32 v25, 8
	; HSA-NEXT: v_mov_b32_e32 v26, 8			; HSA-NEXT: v_mov_b32_e32 v26, 8
	; HSA-NEXT: v_mov_b32_e32 v27, 9			; HSA-NEXT: v_mov_b32_e32 v27, 9
	; HSA-NEXT: v_mov_b32_e32 v28, 9			; HSA-NEXT: v_mov_b32_e32 v28, 9
	; HSA-NEXT: v_mov_b32_e32 v29, 9			; HSA-NEXT: v_mov_b32_e32 v29, 9
	; HSA-NEXT: v_mov_b32_e32 v30, 10			; HSA-NEXT: v_mov_b32_e32 v30, 10
	; HSA-NEXT: v_writelane_b32 v41, s4, 0
	; HSA-NEXT: v_writelane_b32 v40, s31, 1			; HSA-NEXT: v_writelane_b32 v40, s31, 1
	; HSA-NEXT: s_getpc_b64 s[4:5]			; HSA-NEXT: s_getpc_b64 s[4:5]
	; HSA-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4			; HSA-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v41, 0			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_addk_i32 s32, 0xfc00
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_12xv3i32(			call void @external_void_func_12xv3i32(
	<3 x i32><i32 0, i32 0, i32 0>,			<3 x i32><i32 0, i32 0, i32 0>,
	Show All 14 Lines
	define void @stack_12xv3f32() #0 {			define void @stack_12xv3f32() #0 {
	; VI-LABEL: stack_12xv3f32:			; VI-LABEL: stack_12xv3f32:
	; VI: ; %bb.0: ; %entry			; VI: ; %bb.0: ; %entry
	; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s4, s33			; VI-NEXT: s_mov_b32 s4, s33
	; VI-NEXT: s_mov_b32 s33, s32			; VI-NEXT: s_mov_b32 s33, s32
	; VI-NEXT: s_or_saveexec_b64 s[8:9], -1			; VI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; VI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; VI-NEXT: s_mov_b64 exec, s[8:9]			; VI-NEXT: s_mov_b64 exec, s[8:9]
	; VI-NEXT: s_addk_i32 s32, 0x400			; VI-NEXT: s_addk_i32 s32, 0x400
	; VI-NEXT: v_mov_b32_e32 v0, 0x41300000			; VI-NEXT: v_mov_b32_e32 v0, 0x41300000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; VI-NEXT: v_mov_b32_e32 v0, 0x41400000			; VI-NEXT: v_mov_b32_e32 v0, 0x41400000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; VI-NEXT: v_mov_b32_e32 v0, 0x41500000			; VI-NEXT: v_mov_b32_e32 v0, 0x41500000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; VI-NEXT: v_mov_b32_e32 v0, 0x41600000			; VI-NEXT: v_mov_b32_e32 v0, 0x41600000
				; VI-NEXT: v_writelane_b32 v40, s4, 2
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; VI-NEXT: v_mov_b32_e32 v0, 0x41700000			; VI-NEXT: v_mov_b32_e32 v0, 0x41700000
	; VI-NEXT: v_writelane_b32 v40, s30, 0			; VI-NEXT: v_writelane_b32 v40, s30, 0
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0			; VI-NEXT: v_mov_b32_e32 v1, 0
	; VI-NEXT: v_mov_b32_e32 v2, 0			; VI-NEXT: v_mov_b32_e32 v2, 0
	; VI-NEXT: v_mov_b32_e32 v3, 1.0			; VI-NEXT: v_mov_b32_e32 v3, 1.0
	Show All 19 Lines
	; VI-NEXT: v_mov_b32_e32 v23, 0x40e00000			; VI-NEXT: v_mov_b32_e32 v23, 0x40e00000
	; VI-NEXT: v_mov_b32_e32 v24, 0x41000000			; VI-NEXT: v_mov_b32_e32 v24, 0x41000000
	; VI-NEXT: v_mov_b32_e32 v25, 0x41000000			; VI-NEXT: v_mov_b32_e32 v25, 0x41000000
	; VI-NEXT: v_mov_b32_e32 v26, 0x41000000			; VI-NEXT: v_mov_b32_e32 v26, 0x41000000
	; VI-NEXT: v_mov_b32_e32 v27, 0x41100000			; VI-NEXT: v_mov_b32_e32 v27, 0x41100000
	; VI-NEXT: v_mov_b32_e32 v28, 0x41100000			; VI-NEXT: v_mov_b32_e32 v28, 0x41100000
	; VI-NEXT: v_mov_b32_e32 v29, 0x41100000			; VI-NEXT: v_mov_b32_e32 v29, 0x41100000
	; VI-NEXT: v_mov_b32_e32 v30, 0x41200000			; VI-NEXT: v_mov_b32_e32 v30, 0x41200000
	; VI-NEXT: v_writelane_b32 v41, s4, 0
	; VI-NEXT: v_writelane_b32 v40, s31, 1			; VI-NEXT: v_writelane_b32 v40, s31, 1
	; VI-NEXT: s_getpc_b64 s[4:5]			; VI-NEXT: s_getpc_b64 s[4:5]
	; VI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4			; VI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v41, 0			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_addk_i32 s32, 0xfc00
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_12xv3f32:			; CI-LABEL: stack_12xv3f32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	; CI-NEXT: s_mov_b32 s33, s32			; CI-NEXT: s_mov_b32 s33, s32
	; CI-NEXT: s_or_saveexec_b64 s[8:9], -1			; CI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CI-NEXT: s_mov_b64 exec, s[8:9]			; CI-NEXT: s_mov_b64 exec, s[8:9]
	; CI-NEXT: s_addk_i32 s32, 0x400			; CI-NEXT: s_addk_i32 s32, 0x400
	; CI-NEXT: v_mov_b32_e32 v0, 0x41300000			; CI-NEXT: v_mov_b32_e32 v0, 0x41300000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; CI-NEXT: v_mov_b32_e32 v0, 0x41400000			; CI-NEXT: v_mov_b32_e32 v0, 0x41400000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; CI-NEXT: v_mov_b32_e32 v0, 0x41500000			; CI-NEXT: v_mov_b32_e32 v0, 0x41500000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; CI-NEXT: v_mov_b32_e32 v0, 0x41600000			; CI-NEXT: v_mov_b32_e32 v0, 0x41600000
				; CI-NEXT: v_writelane_b32 v40, s4, 2
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; CI-NEXT: v_mov_b32_e32 v0, 0x41700000			; CI-NEXT: v_mov_b32_e32 v0, 0x41700000
	; CI-NEXT: v_writelane_b32 v40, s30, 0			; CI-NEXT: v_writelane_b32 v40, s30, 0
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; CI-NEXT: v_mov_b32_e32 v0, 0			; CI-NEXT: v_mov_b32_e32 v0, 0
	; CI-NEXT: v_mov_b32_e32 v1, 0			; CI-NEXT: v_mov_b32_e32 v1, 0
	; CI-NEXT: v_mov_b32_e32 v2, 0			; CI-NEXT: v_mov_b32_e32 v2, 0
	; CI-NEXT: v_mov_b32_e32 v3, 1.0			; CI-NEXT: v_mov_b32_e32 v3, 1.0
	Show All 19 Lines
	; CI-NEXT: v_mov_b32_e32 v23, 0x40e00000			; CI-NEXT: v_mov_b32_e32 v23, 0x40e00000
	; CI-NEXT: v_mov_b32_e32 v24, 0x41000000			; CI-NEXT: v_mov_b32_e32 v24, 0x41000000
	; CI-NEXT: v_mov_b32_e32 v25, 0x41000000			; CI-NEXT: v_mov_b32_e32 v25, 0x41000000
	; CI-NEXT: v_mov_b32_e32 v26, 0x41000000			; CI-NEXT: v_mov_b32_e32 v26, 0x41000000
	; CI-NEXT: v_mov_b32_e32 v27, 0x41100000			; CI-NEXT: v_mov_b32_e32 v27, 0x41100000
	; CI-NEXT: v_mov_b32_e32 v28, 0x41100000			; CI-NEXT: v_mov_b32_e32 v28, 0x41100000
	; CI-NEXT: v_mov_b32_e32 v29, 0x41100000			; CI-NEXT: v_mov_b32_e32 v29, 0x41100000
	; CI-NEXT: v_mov_b32_e32 v30, 0x41200000			; CI-NEXT: v_mov_b32_e32 v30, 0x41200000
	; CI-NEXT: v_writelane_b32 v41, s4, 0
	; CI-NEXT: v_writelane_b32 v40, s31, 1			; CI-NEXT: v_writelane_b32 v40, s31, 1
	; CI-NEXT: s_getpc_b64 s[4:5]			; CI-NEXT: s_getpc_b64 s[4:5]
	; CI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4			; CI-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v41, 0			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_addk_i32 s32, 0xfc00
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_12xv3f32:			; GFX9-LABEL: stack_12xv3f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1			; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[8:9]			; GFX9-NEXT: s_mov_b64 exec, s[8:9]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000
				; GFX9-NEXT: v_writelane_b32 v40, s4, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 1.0			; GFX9-NEXT: v_mov_b32_e32 v3, 1.0
	Show All 19 Lines
	; GFX9-NEXT: v_mov_b32_e32 v23, 0x40e00000			; GFX9-NEXT: v_mov_b32_e32 v23, 0x40e00000
	; GFX9-NEXT: v_mov_b32_e32 v24, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v24, 0x41000000
	; GFX9-NEXT: v_mov_b32_e32 v25, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v25, 0x41000000
	; GFX9-NEXT: v_mov_b32_e32 v26, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v26, 0x41000000
	; GFX9-NEXT: v_mov_b32_e32 v27, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v27, 0x41100000
	; GFX9-NEXT: v_mov_b32_e32 v28, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v28, 0x41100000
	; GFX9-NEXT: v_mov_b32_e32 v29, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v29, 0x41100000
	; GFX9-NEXT: v_mov_b32_e32 v30, 0x41200000			; GFX9-NEXT: v_mov_b32_e32 v30, 0x41200000
	; GFX9-NEXT: v_writelane_b32 v41, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_12xv3f32:			; GFX11-LABEL: stack_12xv3f32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX11-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x41400000			; GFX11-NEXT: v_mov_b32_e32 v1, 0x41400000
	; GFX11-NEXT: v_mov_b32_e32 v2, 0x41500000			; GFX11-NEXT: v_mov_b32_e32 v2, 0x41500000
	; GFX11-NEXT: v_mov_b32_e32 v3, 0x41600000			; GFX11-NEXT: v_mov_b32_e32 v3, 0x41600000
	; GFX11-NEXT: v_dual_mov_b32 v4, 0x41700000 :: v_dual_mov_b32 v5, 1.0			; GFX11-NEXT: v_dual_mov_b32 v4, 0x41700000 :: v_dual_mov_b32 v5, 1.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: scratch_store_b32 off, v4, s0			; GFX11-NEXT: scratch_store_b32 off, v4, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1.0			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1.0
	; GFX11-NEXT: v_dual_mov_b32 v4, 1.0 :: v_dual_mov_b32 v7, 2.0			; GFX11-NEXT: v_dual_mov_b32 v4, 1.0 :: v_dual_mov_b32 v7, 2.0
	; GFX11-NEXT: v_dual_mov_b32 v6, 2.0 :: v_dual_mov_b32 v9, 0x40400000			; GFX11-NEXT: v_dual_mov_b32 v6, 2.0 :: v_dual_mov_b32 v9, 0x40400000
	; GFX11-NEXT: v_dual_mov_b32 v8, 2.0 :: v_dual_mov_b32 v11, 0x40400000			; GFX11-NEXT: v_dual_mov_b32 v8, 2.0 :: v_dual_mov_b32 v11, 0x40400000
	; GFX11-NEXT: v_dual_mov_b32 v10, 0x40400000 :: v_dual_mov_b32 v13, 4.0			; GFX11-NEXT: v_dual_mov_b32 v10, 0x40400000 :: v_dual_mov_b32 v13, 4.0
	Show All 12 Lines
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3f32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_12xv3f32:			; HSA-LABEL: stack_12xv3f32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	; HSA-NEXT: s_mov_b32 s33, s32			; HSA-NEXT: s_mov_b32 s33, s32
	; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1			; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1
	; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; HSA-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; HSA-NEXT: s_mov_b64 exec, s[8:9]			; HSA-NEXT: s_mov_b64 exec, s[8:9]
	; HSA-NEXT: s_addk_i32 s32, 0x400			; HSA-NEXT: s_addk_i32 s32, 0x400
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41300000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41300000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41400000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41400000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41500000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41500000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41600000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41600000
				; HSA-NEXT: v_writelane_b32 v40, s4, 2
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41700000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41700000
	; HSA-NEXT: v_writelane_b32 v40, s30, 0			; HSA-NEXT: v_writelane_b32 v40, s30, 0
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; HSA-NEXT: v_mov_b32_e32 v0, 0			; HSA-NEXT: v_mov_b32_e32 v0, 0
	; HSA-NEXT: v_mov_b32_e32 v1, 0			; HSA-NEXT: v_mov_b32_e32 v1, 0
	; HSA-NEXT: v_mov_b32_e32 v2, 0			; HSA-NEXT: v_mov_b32_e32 v2, 0
	; HSA-NEXT: v_mov_b32_e32 v3, 1.0			; HSA-NEXT: v_mov_b32_e32 v3, 1.0
	Show All 19 Lines
	; HSA-NEXT: v_mov_b32_e32 v23, 0x40e00000			; HSA-NEXT: v_mov_b32_e32 v23, 0x40e00000
	; HSA-NEXT: v_mov_b32_e32 v24, 0x41000000			; HSA-NEXT: v_mov_b32_e32 v24, 0x41000000
	; HSA-NEXT: v_mov_b32_e32 v25, 0x41000000			; HSA-NEXT: v_mov_b32_e32 v25, 0x41000000
	; HSA-NEXT: v_mov_b32_e32 v26, 0x41000000			; HSA-NEXT: v_mov_b32_e32 v26, 0x41000000
	; HSA-NEXT: v_mov_b32_e32 v27, 0x41100000			; HSA-NEXT: v_mov_b32_e32 v27, 0x41100000
	; HSA-NEXT: v_mov_b32_e32 v28, 0x41100000			; HSA-NEXT: v_mov_b32_e32 v28, 0x41100000
	; HSA-NEXT: v_mov_b32_e32 v29, 0x41100000			; HSA-NEXT: v_mov_b32_e32 v29, 0x41100000
	; HSA-NEXT: v_mov_b32_e32 v30, 0x41200000			; HSA-NEXT: v_mov_b32_e32 v30, 0x41200000
	; HSA-NEXT: v_writelane_b32 v41, s4, 0
	; HSA-NEXT: v_writelane_b32 v40, s31, 1			; HSA-NEXT: v_writelane_b32 v40, s31, 1
	; HSA-NEXT: s_getpc_b64 s[4:5]			; HSA-NEXT: s_getpc_b64 s[4:5]
	; HSA-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4			; HSA-NEXT: s_add_u32 s4, s4, external_void_func_12xv3f32@rel32@lo+4
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v41, 0			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_addk_i32 s32, 0xfc00
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_12xv3f32(			call void @external_void_func_12xv3f32(
	<3 x float><float 0.0, float 0.0, float 0.0>,			<3 x float><float 0.0, float 0.0, float 0.0>,
	Show All 14 Lines
	define void @stack_8xv5i32() #0 {			define void @stack_8xv5i32() #0 {
	; VI-LABEL: stack_8xv5i32:			; VI-LABEL: stack_8xv5i32:
	; VI: ; %bb.0: ; %entry			; VI: ; %bb.0: ; %entry
	; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s4, s33			; VI-NEXT: s_mov_b32 s4, s33
	; VI-NEXT: s_mov_b32 s33, s32			; VI-NEXT: s_mov_b32 s33, s32
	; VI-NEXT: s_or_saveexec_b64 s[8:9], -1			; VI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; VI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; VI-NEXT: s_mov_b64 exec, s[8:9]			; VI-NEXT: s_mov_b64 exec, s[8:9]
	; VI-NEXT: s_addk_i32 s32, 0x400			; VI-NEXT: s_addk_i32 s32, 0x400
	; VI-NEXT: v_mov_b32_e32 v0, 7			; VI-NEXT: v_mov_b32_e32 v0, 7
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; VI-NEXT: v_mov_b32_e32 v0, 8			; VI-NEXT: v_mov_b32_e32 v0, 8
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; VI-NEXT: v_mov_b32_e32 v0, 9			; VI-NEXT: v_mov_b32_e32 v0, 9
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; VI-NEXT: v_mov_b32_e32 v0, 10			; VI-NEXT: v_mov_b32_e32 v0, 10
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; VI-NEXT: v_mov_b32_e32 v0, 11			; VI-NEXT: v_mov_b32_e32 v0, 11
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; VI-NEXT: v_mov_b32_e32 v0, 12			; VI-NEXT: v_mov_b32_e32 v0, 12
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; VI-NEXT: v_mov_b32_e32 v0, 13			; VI-NEXT: v_mov_b32_e32 v0, 13
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; VI-NEXT: v_mov_b32_e32 v0, 14			; VI-NEXT: v_mov_b32_e32 v0, 14
				; VI-NEXT: v_writelane_b32 v40, s4, 2
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; VI-NEXT: v_mov_b32_e32 v0, 15			; VI-NEXT: v_mov_b32_e32 v0, 15
	; VI-NEXT: v_writelane_b32 v40, s30, 0			; VI-NEXT: v_writelane_b32 v40, s30, 0
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0			; VI-NEXT: v_mov_b32_e32 v1, 0
	; VI-NEXT: v_mov_b32_e32 v2, 0			; VI-NEXT: v_mov_b32_e32 v2, 0
	; VI-NEXT: v_mov_b32_e32 v3, 0			; VI-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; VI-NEXT: v_mov_b32_e32 v23, 4			; VI-NEXT: v_mov_b32_e32 v23, 4
	; VI-NEXT: v_mov_b32_e32 v24, 4			; VI-NEXT: v_mov_b32_e32 v24, 4
	; VI-NEXT: v_mov_b32_e32 v25, 5			; VI-NEXT: v_mov_b32_e32 v25, 5
	; VI-NEXT: v_mov_b32_e32 v26, 5			; VI-NEXT: v_mov_b32_e32 v26, 5
	; VI-NEXT: v_mov_b32_e32 v27, 5			; VI-NEXT: v_mov_b32_e32 v27, 5
	; VI-NEXT: v_mov_b32_e32 v28, 5			; VI-NEXT: v_mov_b32_e32 v28, 5
	; VI-NEXT: v_mov_b32_e32 v29, 5			; VI-NEXT: v_mov_b32_e32 v29, 5
	; VI-NEXT: v_mov_b32_e32 v30, 6			; VI-NEXT: v_mov_b32_e32 v30, 6
	; VI-NEXT: v_writelane_b32 v41, s4, 0
	; VI-NEXT: v_writelane_b32 v40, s31, 1			; VI-NEXT: v_writelane_b32 v40, s31, 1
	; VI-NEXT: s_getpc_b64 s[4:5]			; VI-NEXT: s_getpc_b64 s[4:5]
	; VI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4			; VI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v41, 0			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_addk_i32 s32, 0xfc00
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_8xv5i32:			; CI-LABEL: stack_8xv5i32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	; CI-NEXT: s_mov_b32 s33, s32			; CI-NEXT: s_mov_b32 s33, s32
	; CI-NEXT: s_or_saveexec_b64 s[8:9], -1			; CI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CI-NEXT: s_mov_b64 exec, s[8:9]			; CI-NEXT: s_mov_b64 exec, s[8:9]
	; CI-NEXT: s_addk_i32 s32, 0x400			; CI-NEXT: s_addk_i32 s32, 0x400
	; CI-NEXT: v_mov_b32_e32 v0, 7			; CI-NEXT: v_mov_b32_e32 v0, 7
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; CI-NEXT: v_mov_b32_e32 v0, 8			; CI-NEXT: v_mov_b32_e32 v0, 8
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; CI-NEXT: v_mov_b32_e32 v0, 9			; CI-NEXT: v_mov_b32_e32 v0, 9
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; CI-NEXT: v_mov_b32_e32 v0, 10			; CI-NEXT: v_mov_b32_e32 v0, 10
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; CI-NEXT: v_mov_b32_e32 v0, 11			; CI-NEXT: v_mov_b32_e32 v0, 11
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; CI-NEXT: v_mov_b32_e32 v0, 12			; CI-NEXT: v_mov_b32_e32 v0, 12
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; CI-NEXT: v_mov_b32_e32 v0, 13			; CI-NEXT: v_mov_b32_e32 v0, 13
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; CI-NEXT: v_mov_b32_e32 v0, 14			; CI-NEXT: v_mov_b32_e32 v0, 14
				; CI-NEXT: v_writelane_b32 v40, s4, 2
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; CI-NEXT: v_mov_b32_e32 v0, 15			; CI-NEXT: v_mov_b32_e32 v0, 15
	; CI-NEXT: v_writelane_b32 v40, s30, 0			; CI-NEXT: v_writelane_b32 v40, s30, 0
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; CI-NEXT: v_mov_b32_e32 v0, 0			; CI-NEXT: v_mov_b32_e32 v0, 0
	; CI-NEXT: v_mov_b32_e32 v1, 0			; CI-NEXT: v_mov_b32_e32 v1, 0
	; CI-NEXT: v_mov_b32_e32 v2, 0			; CI-NEXT: v_mov_b32_e32 v2, 0
	; CI-NEXT: v_mov_b32_e32 v3, 0			; CI-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; CI-NEXT: v_mov_b32_e32 v23, 4			; CI-NEXT: v_mov_b32_e32 v23, 4
	; CI-NEXT: v_mov_b32_e32 v24, 4			; CI-NEXT: v_mov_b32_e32 v24, 4
	; CI-NEXT: v_mov_b32_e32 v25, 5			; CI-NEXT: v_mov_b32_e32 v25, 5
	; CI-NEXT: v_mov_b32_e32 v26, 5			; CI-NEXT: v_mov_b32_e32 v26, 5
	; CI-NEXT: v_mov_b32_e32 v27, 5			; CI-NEXT: v_mov_b32_e32 v27, 5
	; CI-NEXT: v_mov_b32_e32 v28, 5			; CI-NEXT: v_mov_b32_e32 v28, 5
	; CI-NEXT: v_mov_b32_e32 v29, 5			; CI-NEXT: v_mov_b32_e32 v29, 5
	; CI-NEXT: v_mov_b32_e32 v30, 6			; CI-NEXT: v_mov_b32_e32 v30, 6
	; CI-NEXT: v_writelane_b32 v41, s4, 0
	; CI-NEXT: v_writelane_b32 v40, s31, 1			; CI-NEXT: v_writelane_b32 v40, s31, 1
	; CI-NEXT: s_getpc_b64 s[4:5]			; CI-NEXT: s_getpc_b64 s[4:5]
	; CI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4			; CI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v41, 0			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_addk_i32 s32, 0xfc00
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_8xv5i32:			; GFX9-LABEL: stack_8xv5i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1			; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[8:9]			; GFX9-NEXT: s_mov_b64 exec, s[8:9]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 7			; GFX9-NEXT: v_mov_b32_e32 v0, 7
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 9			; GFX9-NEXT: v_mov_b32_e32 v0, 9
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 10			; GFX9-NEXT: v_mov_b32_e32 v0, 10
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX9-NEXT: v_mov_b32_e32 v0, 11			; GFX9-NEXT: v_mov_b32_e32 v0, 11
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; GFX9-NEXT: v_mov_b32_e32 v0, 12			; GFX9-NEXT: v_mov_b32_e32 v0, 12
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; GFX9-NEXT: v_mov_b32_e32 v0, 14			; GFX9-NEXT: v_mov_b32_e32 v0, 14
				; GFX9-NEXT: v_writelane_b32 v40, s4, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0			; GFX9-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; GFX9-NEXT: v_mov_b32_e32 v23, 4			; GFX9-NEXT: v_mov_b32_e32 v23, 4
	; GFX9-NEXT: v_mov_b32_e32 v24, 4			; GFX9-NEXT: v_mov_b32_e32 v24, 4
	; GFX9-NEXT: v_mov_b32_e32 v25, 5			; GFX9-NEXT: v_mov_b32_e32 v25, 5
	; GFX9-NEXT: v_mov_b32_e32 v26, 5			; GFX9-NEXT: v_mov_b32_e32 v26, 5
	; GFX9-NEXT: v_mov_b32_e32 v27, 5			; GFX9-NEXT: v_mov_b32_e32 v27, 5
	; GFX9-NEXT: v_mov_b32_e32 v28, 5			; GFX9-NEXT: v_mov_b32_e32 v28, 5
	; GFX9-NEXT: v_mov_b32_e32 v29, 5			; GFX9-NEXT: v_mov_b32_e32 v29, 5
	; GFX9-NEXT: v_mov_b32_e32 v30, 6			; GFX9-NEXT: v_mov_b32_e32 v30, 6
	; GFX9-NEXT: v_writelane_b32 v41, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5i32:			; GFX11-LABEL: stack_8xv5i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 7 :: v_dual_mov_b32 v1, 8			; GFX11-NEXT: v_dual_mov_b32 v0, 7 :: v_dual_mov_b32 v1, 8
	; GFX11-NEXT: v_dual_mov_b32 v2, 9 :: v_dual_mov_b32 v3, 10			; GFX11-NEXT: v_dual_mov_b32 v2, 9 :: v_dual_mov_b32 v3, 10
	; GFX11-NEXT: v_dual_mov_b32 v8, 15 :: v_dual_mov_b32 v5, 12			; GFX11-NEXT: v_dual_mov_b32 v8, 15 :: v_dual_mov_b32 v5, 12
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_dual_mov_b32 v4, 11 :: v_dual_mov_b32 v7, 14			; GFX11-NEXT: v_dual_mov_b32 v4, 11 :: v_dual_mov_b32 v7, 14
	; GFX11-NEXT: v_mov_b32_e32 v6, 13			; GFX11-NEXT: v_mov_b32_e32 v6, 13
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, 32			; GFX11-NEXT: s_add_i32 s0, s32, 32
	; GFX11-NEXT: s_add_i32 s1, s32, 16			; GFX11-NEXT: s_add_i32 s1, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: scratch_store_b32 off, v8, s0			; GFX11-NEXT: scratch_store_b32 off, v8, s0
	; GFX11-NEXT: scratch_store_b128 off, v[4:7], s1			; GFX11-NEXT: scratch_store_b128 off, v[4:7], s1
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v3, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v3, 0
	Show All 15 Lines
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_8xv5i32:			; HSA-LABEL: stack_8xv5i32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	; HSA-NEXT: s_mov_b32 s33, s32			; HSA-NEXT: s_mov_b32 s33, s32
	; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1			; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1
	; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; HSA-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; HSA-NEXT: s_mov_b64 exec, s[8:9]			; HSA-NEXT: s_mov_b64 exec, s[8:9]
	; HSA-NEXT: s_addk_i32 s32, 0x400			; HSA-NEXT: s_addk_i32 s32, 0x400
	; HSA-NEXT: v_mov_b32_e32 v0, 7			; HSA-NEXT: v_mov_b32_e32 v0, 7
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; HSA-NEXT: v_mov_b32_e32 v0, 8			; HSA-NEXT: v_mov_b32_e32 v0, 8
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; HSA-NEXT: v_mov_b32_e32 v0, 9			; HSA-NEXT: v_mov_b32_e32 v0, 9
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; HSA-NEXT: v_mov_b32_e32 v0, 10			; HSA-NEXT: v_mov_b32_e32 v0, 10
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; HSA-NEXT: v_mov_b32_e32 v0, 11			; HSA-NEXT: v_mov_b32_e32 v0, 11
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; HSA-NEXT: v_mov_b32_e32 v0, 12			; HSA-NEXT: v_mov_b32_e32 v0, 12
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; HSA-NEXT: v_mov_b32_e32 v0, 13			; HSA-NEXT: v_mov_b32_e32 v0, 13
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; HSA-NEXT: v_mov_b32_e32 v0, 14			; HSA-NEXT: v_mov_b32_e32 v0, 14
				; HSA-NEXT: v_writelane_b32 v40, s4, 2
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; HSA-NEXT: v_mov_b32_e32 v0, 15			; HSA-NEXT: v_mov_b32_e32 v0, 15
	; HSA-NEXT: v_writelane_b32 v40, s30, 0			; HSA-NEXT: v_writelane_b32 v40, s30, 0
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; HSA-NEXT: v_mov_b32_e32 v0, 0			; HSA-NEXT: v_mov_b32_e32 v0, 0
	; HSA-NEXT: v_mov_b32_e32 v1, 0			; HSA-NEXT: v_mov_b32_e32 v1, 0
	; HSA-NEXT: v_mov_b32_e32 v2, 0			; HSA-NEXT: v_mov_b32_e32 v2, 0
	; HSA-NEXT: v_mov_b32_e32 v3, 0			; HSA-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; HSA-NEXT: v_mov_b32_e32 v23, 4			; HSA-NEXT: v_mov_b32_e32 v23, 4
	; HSA-NEXT: v_mov_b32_e32 v24, 4			; HSA-NEXT: v_mov_b32_e32 v24, 4
	; HSA-NEXT: v_mov_b32_e32 v25, 5			; HSA-NEXT: v_mov_b32_e32 v25, 5
	; HSA-NEXT: v_mov_b32_e32 v26, 5			; HSA-NEXT: v_mov_b32_e32 v26, 5
	; HSA-NEXT: v_mov_b32_e32 v27, 5			; HSA-NEXT: v_mov_b32_e32 v27, 5
	; HSA-NEXT: v_mov_b32_e32 v28, 5			; HSA-NEXT: v_mov_b32_e32 v28, 5
	; HSA-NEXT: v_mov_b32_e32 v29, 5			; HSA-NEXT: v_mov_b32_e32 v29, 5
	; HSA-NEXT: v_mov_b32_e32 v30, 6			; HSA-NEXT: v_mov_b32_e32 v30, 6
	; HSA-NEXT: v_writelane_b32 v41, s4, 0
	; HSA-NEXT: v_writelane_b32 v40, s31, 1			; HSA-NEXT: v_writelane_b32 v40, s31, 1
	; HSA-NEXT: s_getpc_b64 s[4:5]			; HSA-NEXT: s_getpc_b64 s[4:5]
	; HSA-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4			; HSA-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v41, 0			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_addk_i32 s32, 0xfc00
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_8xv5i32(			call void @external_void_func_8xv5i32(
	<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,			<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,
	Show All 10 Lines
	define void @stack_8xv5f32() #0 {			define void @stack_8xv5f32() #0 {
	; VI-LABEL: stack_8xv5f32:			; VI-LABEL: stack_8xv5f32:
	; VI: ; %bb.0: ; %entry			; VI: ; %bb.0: ; %entry
	; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s4, s33			; VI-NEXT: s_mov_b32 s4, s33
	; VI-NEXT: s_mov_b32 s33, s32			; VI-NEXT: s_mov_b32 s33, s32
	; VI-NEXT: s_or_saveexec_b64 s[8:9], -1			; VI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; VI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; VI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; VI-NEXT: s_mov_b64 exec, s[8:9]			; VI-NEXT: s_mov_b64 exec, s[8:9]
	; VI-NEXT: s_addk_i32 s32, 0x400			; VI-NEXT: s_addk_i32 s32, 0x400
	; VI-NEXT: v_mov_b32_e32 v0, 0x40e00000			; VI-NEXT: v_mov_b32_e32 v0, 0x40e00000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; VI-NEXT: v_mov_b32_e32 v0, 0x41000000			; VI-NEXT: v_mov_b32_e32 v0, 0x41000000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; VI-NEXT: v_mov_b32_e32 v0, 0x41100000			; VI-NEXT: v_mov_b32_e32 v0, 0x41100000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; VI-NEXT: v_mov_b32_e32 v0, 0x41200000			; VI-NEXT: v_mov_b32_e32 v0, 0x41200000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; VI-NEXT: v_mov_b32_e32 v0, 0x41300000			; VI-NEXT: v_mov_b32_e32 v0, 0x41300000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; VI-NEXT: v_mov_b32_e32 v0, 0x41400000			; VI-NEXT: v_mov_b32_e32 v0, 0x41400000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; VI-NEXT: v_mov_b32_e32 v0, 0x41500000			; VI-NEXT: v_mov_b32_e32 v0, 0x41500000
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; VI-NEXT: v_mov_b32_e32 v0, 0x41600000			; VI-NEXT: v_mov_b32_e32 v0, 0x41600000
				; VI-NEXT: v_writelane_b32 v40, s4, 2
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; VI-NEXT: v_mov_b32_e32 v0, 0x41700000			; VI-NEXT: v_mov_b32_e32 v0, 0x41700000
	; VI-NEXT: v_writelane_b32 v40, s30, 0			; VI-NEXT: v_writelane_b32 v40, s30, 0
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; VI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0			; VI-NEXT: v_mov_b32_e32 v1, 0
	; VI-NEXT: v_mov_b32_e32 v2, 0			; VI-NEXT: v_mov_b32_e32 v2, 0
	; VI-NEXT: v_mov_b32_e32 v3, 0			; VI-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; VI-NEXT: v_mov_b32_e32 v23, 4.0			; VI-NEXT: v_mov_b32_e32 v23, 4.0
	; VI-NEXT: v_mov_b32_e32 v24, 4.0			; VI-NEXT: v_mov_b32_e32 v24, 4.0
	; VI-NEXT: v_mov_b32_e32 v25, 0x40a00000			; VI-NEXT: v_mov_b32_e32 v25, 0x40a00000
	; VI-NEXT: v_mov_b32_e32 v26, 0x40a00000			; VI-NEXT: v_mov_b32_e32 v26, 0x40a00000
	; VI-NEXT: v_mov_b32_e32 v27, 0x40a00000			; VI-NEXT: v_mov_b32_e32 v27, 0x40a00000
	; VI-NEXT: v_mov_b32_e32 v28, 0x40a00000			; VI-NEXT: v_mov_b32_e32 v28, 0x40a00000
	; VI-NEXT: v_mov_b32_e32 v29, 0x40a00000			; VI-NEXT: v_mov_b32_e32 v29, 0x40a00000
	; VI-NEXT: v_mov_b32_e32 v30, 0x40c00000			; VI-NEXT: v_mov_b32_e32 v30, 0x40c00000
	; VI-NEXT: v_writelane_b32 v41, s4, 0
	; VI-NEXT: v_writelane_b32 v40, s31, 1			; VI-NEXT: v_writelane_b32 v40, s31, 1
	; VI-NEXT: s_getpc_b64 s[4:5]			; VI-NEXT: s_getpc_b64 s[4:5]
	; VI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4			; VI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v41, 0			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_addk_i32 s32, 0xfc00
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_8xv5f32:			; CI-LABEL: stack_8xv5f32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	; CI-NEXT: s_mov_b32 s33, s32			; CI-NEXT: s_mov_b32 s33, s32
	; CI-NEXT: s_or_saveexec_b64 s[8:9], -1			; CI-NEXT: s_or_saveexec_b64 s[8:9], -1
	; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CI-NEXT: s_mov_b64 exec, s[8:9]			; CI-NEXT: s_mov_b64 exec, s[8:9]
	; CI-NEXT: s_addk_i32 s32, 0x400			; CI-NEXT: s_addk_i32 s32, 0x400
	; CI-NEXT: v_mov_b32_e32 v0, 0x40e00000			; CI-NEXT: v_mov_b32_e32 v0, 0x40e00000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; CI-NEXT: v_mov_b32_e32 v0, 0x41000000			; CI-NEXT: v_mov_b32_e32 v0, 0x41000000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; CI-NEXT: v_mov_b32_e32 v0, 0x41100000			; CI-NEXT: v_mov_b32_e32 v0, 0x41100000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; CI-NEXT: v_mov_b32_e32 v0, 0x41200000			; CI-NEXT: v_mov_b32_e32 v0, 0x41200000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; CI-NEXT: v_mov_b32_e32 v0, 0x41300000			; CI-NEXT: v_mov_b32_e32 v0, 0x41300000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; CI-NEXT: v_mov_b32_e32 v0, 0x41400000			; CI-NEXT: v_mov_b32_e32 v0, 0x41400000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; CI-NEXT: v_mov_b32_e32 v0, 0x41500000			; CI-NEXT: v_mov_b32_e32 v0, 0x41500000
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; CI-NEXT: v_mov_b32_e32 v0, 0x41600000			; CI-NEXT: v_mov_b32_e32 v0, 0x41600000
				; CI-NEXT: v_writelane_b32 v40, s4, 2
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; CI-NEXT: v_mov_b32_e32 v0, 0x41700000			; CI-NEXT: v_mov_b32_e32 v0, 0x41700000
	; CI-NEXT: v_writelane_b32 v40, s30, 0			; CI-NEXT: v_writelane_b32 v40, s30, 0
	; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; CI-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; CI-NEXT: v_mov_b32_e32 v0, 0			; CI-NEXT: v_mov_b32_e32 v0, 0
	; CI-NEXT: v_mov_b32_e32 v1, 0			; CI-NEXT: v_mov_b32_e32 v1, 0
	; CI-NEXT: v_mov_b32_e32 v2, 0			; CI-NEXT: v_mov_b32_e32 v2, 0
	; CI-NEXT: v_mov_b32_e32 v3, 0			; CI-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; CI-NEXT: v_mov_b32_e32 v23, 4.0			; CI-NEXT: v_mov_b32_e32 v23, 4.0
	; CI-NEXT: v_mov_b32_e32 v24, 4.0			; CI-NEXT: v_mov_b32_e32 v24, 4.0
	; CI-NEXT: v_mov_b32_e32 v25, 0x40a00000			; CI-NEXT: v_mov_b32_e32 v25, 0x40a00000
	; CI-NEXT: v_mov_b32_e32 v26, 0x40a00000			; CI-NEXT: v_mov_b32_e32 v26, 0x40a00000
	; CI-NEXT: v_mov_b32_e32 v27, 0x40a00000			; CI-NEXT: v_mov_b32_e32 v27, 0x40a00000
	; CI-NEXT: v_mov_b32_e32 v28, 0x40a00000			; CI-NEXT: v_mov_b32_e32 v28, 0x40a00000
	; CI-NEXT: v_mov_b32_e32 v29, 0x40a00000			; CI-NEXT: v_mov_b32_e32 v29, 0x40a00000
	; CI-NEXT: v_mov_b32_e32 v30, 0x40c00000			; CI-NEXT: v_mov_b32_e32 v30, 0x40c00000
	; CI-NEXT: v_writelane_b32 v41, s4, 0
	; CI-NEXT: v_writelane_b32 v40, s31, 1			; CI-NEXT: v_writelane_b32 v40, s31, 1
	; CI-NEXT: s_getpc_b64 s[4:5]			; CI-NEXT: s_getpc_b64 s[4:5]
	; CI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4			; CI-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v41, 0			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_addk_i32 s32, 0xfc00
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_8xv5f32:			; GFX9-LABEL: stack_8xv5f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1			; GFX9-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[8:9]			; GFX9-NEXT: s_mov_b64 exec, s[8:9]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x40e00000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x40e00000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000
				; GFX9-NEXT: v_writelane_b32 v40, s4, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0			; GFX9-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; GFX9-NEXT: v_mov_b32_e32 v23, 4.0			; GFX9-NEXT: v_mov_b32_e32 v23, 4.0
	; GFX9-NEXT: v_mov_b32_e32 v24, 4.0			; GFX9-NEXT: v_mov_b32_e32 v24, 4.0
	; GFX9-NEXT: v_mov_b32_e32 v25, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v25, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v26, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v26, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v27, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v27, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v28, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v28, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v29, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v29, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v30, 0x40c00000			; GFX9-NEXT: v_mov_b32_e32 v30, 0x40c00000
	; GFX9-NEXT: v_writelane_b32 v41, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5f32:			; GFX11-LABEL: stack_8xv5f32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x40e00000			; GFX11-NEXT: v_mov_b32_e32 v0, 0x40e00000
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x41000000			; GFX11-NEXT: v_mov_b32_e32 v1, 0x41000000
	; GFX11-NEXT: v_mov_b32_e32 v2, 0x41100000			; GFX11-NEXT: v_mov_b32_e32 v2, 0x41100000
	; GFX11-NEXT: v_mov_b32_e32 v3, 0x41200000			; GFX11-NEXT: v_mov_b32_e32 v3, 0x41200000
	; GFX11-NEXT: v_mov_b32_e32 v8, 0x41700000			; GFX11-NEXT: v_mov_b32_e32 v8, 0x41700000
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_mov_b32_e32 v4, 0x41300000			; GFX11-NEXT: v_mov_b32_e32 v4, 0x41300000
	; GFX11-NEXT: v_mov_b32_e32 v5, 0x41400000			; GFX11-NEXT: v_mov_b32_e32 v5, 0x41400000
	; GFX11-NEXT: v_dual_mov_b32 v6, 0x41500000 :: v_dual_mov_b32 v9, 1.0			; GFX11-NEXT: v_dual_mov_b32 v6, 0x41500000 :: v_dual_mov_b32 v9, 1.0
	; GFX11-NEXT: v_mov_b32_e32 v7, 0x41600000			; GFX11-NEXT: v_mov_b32_e32 v7, 0x41600000
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, 32			; GFX11-NEXT: s_add_i32 s0, s32, 32
	; GFX11-NEXT: s_add_i32 s1, s32, 16			; GFX11-NEXT: s_add_i32 s1, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: scratch_store_b32 off, v8, s0			; GFX11-NEXT: scratch_store_b32 off, v8, s0
	; GFX11-NEXT: scratch_store_b128 off, v[4:7], s1			; GFX11-NEXT: scratch_store_b128 off, v[4:7], s1
	; GFX11-NEXT: v_mov_b32_e32 v6, 1.0			; GFX11-NEXT: v_mov_b32_e32 v6, 1.0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	Show All 14 Lines
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_8xv5f32:			; HSA-LABEL: stack_8xv5f32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	; HSA-NEXT: s_mov_b32 s33, s32			; HSA-NEXT: s_mov_b32 s33, s32
	; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1			; HSA-NEXT: s_or_saveexec_b64 s[8:9], -1
	; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; HSA-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; HSA-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; HSA-NEXT: s_mov_b64 exec, s[8:9]			; HSA-NEXT: s_mov_b64 exec, s[8:9]
	; HSA-NEXT: s_addk_i32 s32, 0x400			; HSA-NEXT: s_addk_i32 s32, 0x400
	; HSA-NEXT: v_mov_b32_e32 v0, 0x40e00000			; HSA-NEXT: v_mov_b32_e32 v0, 0x40e00000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41000000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41000000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41100000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41100000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41200000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41200000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41300000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41300000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41400000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41400000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41500000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41500000
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41600000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41600000
				; HSA-NEXT: v_writelane_b32 v40, s4, 2
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; HSA-NEXT: v_mov_b32_e32 v0, 0x41700000			; HSA-NEXT: v_mov_b32_e32 v0, 0x41700000
	; HSA-NEXT: v_writelane_b32 v40, s30, 0			; HSA-NEXT: v_writelane_b32 v40, s30, 0
	; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; HSA-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; HSA-NEXT: v_mov_b32_e32 v0, 0			; HSA-NEXT: v_mov_b32_e32 v0, 0
	; HSA-NEXT: v_mov_b32_e32 v1, 0			; HSA-NEXT: v_mov_b32_e32 v1, 0
	; HSA-NEXT: v_mov_b32_e32 v2, 0			; HSA-NEXT: v_mov_b32_e32 v2, 0
	; HSA-NEXT: v_mov_b32_e32 v3, 0			; HSA-NEXT: v_mov_b32_e32 v3, 0
	Show All 19 Lines
	; HSA-NEXT: v_mov_b32_e32 v23, 4.0			; HSA-NEXT: v_mov_b32_e32 v23, 4.0
	; HSA-NEXT: v_mov_b32_e32 v24, 4.0			; HSA-NEXT: v_mov_b32_e32 v24, 4.0
	; HSA-NEXT: v_mov_b32_e32 v25, 0x40a00000			; HSA-NEXT: v_mov_b32_e32 v25, 0x40a00000
	; HSA-NEXT: v_mov_b32_e32 v26, 0x40a00000			; HSA-NEXT: v_mov_b32_e32 v26, 0x40a00000
	; HSA-NEXT: v_mov_b32_e32 v27, 0x40a00000			; HSA-NEXT: v_mov_b32_e32 v27, 0x40a00000
	; HSA-NEXT: v_mov_b32_e32 v28, 0x40a00000			; HSA-NEXT: v_mov_b32_e32 v28, 0x40a00000
	; HSA-NEXT: v_mov_b32_e32 v29, 0x40a00000			; HSA-NEXT: v_mov_b32_e32 v29, 0x40a00000
	; HSA-NEXT: v_mov_b32_e32 v30, 0x40c00000			; HSA-NEXT: v_mov_b32_e32 v30, 0x40c00000
	; HSA-NEXT: v_writelane_b32 v41, s4, 0
	; HSA-NEXT: v_writelane_b32 v40, s31, 1			; HSA-NEXT: v_writelane_b32 v40, s31, 1
	; HSA-NEXT: s_getpc_b64 s[4:5]			; HSA-NEXT: s_getpc_b64 s[4:5]
	; HSA-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4			; HSA-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v41, 0			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_addk_i32 s32, 0xfc00
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_8xv5f32(			call void @external_void_func_8xv5f32(
	<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,			<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

	Show All 9 Lines
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_vcc() #1 {			define void @use_vcc() #1 {
	call void asm sideeffect "", "~{vcc}" () #0			call void asm sideeffect "", "~{vcc}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_vcc:			; GCN-LABEL: {{^}}indirect_use_vcc:
	; GCN: s_mov_b32 s4, s33			; GCN: s_mov_b32 s4, s33
	; GCN: v_writelane_b32 v41, s4, 0			; GCN: v_writelane_b32 v40, s4, 2
	; GCN: v_writelane_b32 v40, s30, 0			; GCN: v_writelane_b32 v40, s30, 0
	; GCN: v_writelane_b32 v40, s31, 1			; GCN: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: v_readlane_b32 s31, v40, 1			; GCN: v_readlane_b32 s31, v40, 1
	; GCN: v_readlane_b32 s30, v40, 0			; GCN: v_readlane_b32 s30, v40, 0
	; GCN: v_readlane_b32 s4, v41, 0			; GCN: v_readlane_b32 s4, v40, 2
	; GCN: s_mov_b32 s33, s4			; GCN: s_mov_b32 s33, s4
	; GCN: s_setpc_b64 s[30:31]			; GCN: s_setpc_b64 s[30:31]
	; GCN: ; NumSgprs: 36			; GCN: ; NumSgprs: 36
	; GCN: ; NumVgprs: 42			; GCN: ; NumVgprs: 41
	define void @indirect_use_vcc() #1 {			define void @indirect_use_vcc() #1 {
	call void @use_vcc()			call void @use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 42			; GCN: ; NumVgprs: 41
	define amdgpu_kernel void @indirect_2level_use_vcc_kernel(ptr addrspace(1) %out) #0 {			define amdgpu_kernel void @indirect_2level_use_vcc_kernel(ptr addrspace(1) %out) #0 {
	call void @indirect_use_vcc()			call void @indirect_use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_flat_scratch:			; GCN-LABEL: {{^}}use_flat_scratch:
	; CI: ; NumSgprs: 36			; CI: ; NumSgprs: 36
	; VI: ; NumSgprs: 38			; VI: ; NumSgprs: 38
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_flat_scratch() #1 {			define void @use_flat_scratch() #1 {
	call void asm sideeffect "", "~{flat_scratch}" () #0			call void asm sideeffect "", "~{flat_scratch}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_flat_scratch:			; GCN-LABEL: {{^}}indirect_use_flat_scratch:
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI: ; NumSgprs: 40			; VI: ; NumSgprs: 40
	; GCN: ; NumVgprs: 42			; GCN: ; NumVgprs: 41
	define void @indirect_use_flat_scratch() #1 {			define void @indirect_use_flat_scratch() #1 {
	call void @use_flat_scratch()			call void @use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 42			; GCN: ; NumVgprs: 41
	define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(ptr addrspace(1) %out) #0 {			define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(ptr addrspace(1) %out) #0 {
	call void @indirect_use_flat_scratch()			call void @indirect_use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_10_vgpr:			; GCN-LABEL: {{^}}use_10_vgpr:
	; GCN: ; NumVgprs: 10			; GCN: ; NumVgprs: 10
	define void @use_10_vgpr() #1 {			define void @use_10_vgpr() #1 {
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0
	call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0			call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_use_10_vgpr:
	; GCN: ; NumVgprs: 42			; GCN: ; NumVgprs: 41
	define void @indirect_use_10_vgpr() #0 {			define void @indirect_use_10_vgpr() #0 {
	call void @use_10_vgpr()			call void @use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; GCN: ; NumVgprs: 42			; GCN: ; NumVgprs: 41
	define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {			define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {
	call void @indirect_use_10_vgpr()			call void @indirect_use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_50_vgpr:			; GCN-LABEL: {{^}}use_50_vgpr:
	; GCN: ; NumVgprs: 50			; GCN: ; NumVgprs: 50
	define void @use_50_vgpr() #1 {			define void @use_50_vgpr() #1 {
	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

Show All 18 Lines	define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33		; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; MUBUF: buffer_store_dword		; MUBUF: buffer_store_dword
; MUBUF: buffer_store_dword
; FLATSCR: scratch_store_dword
; FLATSCR: scratch_store_dword		; FLATSCR: scratch_store_dword
		; GCN: v_writelane_b32 v40, [[FP_SCRATCH_COPY]], 4
; GCN: v_writelane_b32 v40, s30, 0		; GCN: v_writelane_b32 v40, s30, 0
; GCN: v_writelane_b32 v40, s31, 1		; GCN: v_writelane_b32 v40, s31, 1
; GCN: v_writelane_b32 v40, s34, 2		; GCN: v_writelane_b32 v40, s34, 2
; GCN: v_writelane_b32 v41, [[FP_SCRATCH_COPY]], 0
; GCN: v_writelane_b32 v40, s35, 3		; GCN: v_writelane_b32 v40, s35, 3

; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN: v_readlane_b32 s35, v40, 3		; GCN: v_readlane_b32 s35, v40, 3
; GCN: v_readlane_b32 s34, v40, 2		; GCN: v_readlane_b32 s34, v40, 2
; MUBUF-DAG: v_readlane_b32 s31, v40, 1		; MUBUF-DAG: v_readlane_b32 s31, v40, 1
; MUBUF-DAG: v_readlane_b32 s30, v40, 0		; MUBUF-DAG: v_readlane_b32 s30, v40, 0
; FLATSCR-DAG: v_readlane_b32 s31, v40, 1		; FLATSCR-DAG: v_readlane_b32 s31, v40, 1
; FLATSCR-DAG: v_readlane_b32 s30, v40, 0		; FLATSCR-DAG: v_readlane_b32 s30, v40, 0

; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v41, 0		; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v40, 4
; MUBUF: buffer_load_dword		; MUBUF: buffer_load_dword
; MUBUF: buffer_load_dword
; FLATSCR: scratch_load_dword
; FLATSCR: scratch_load_dword		; FLATSCR: scratch_load_dword
; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]		; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
; GCN: s_setpc_b64 s[30:31]		; GCN: s_setpc_b64 s[30:31]
define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:		; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:
; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33		; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; MUBUF: buffer_store_dword v40		; MUBUF: buffer_store_dword v40
; MUBUF: buffer_store_dword v41
; FLATSCR: scratch_store_dword off, v40		; FLATSCR: scratch_store_dword off, v40
; FLATSCR: scratch_store_dword off, v41		; GCN: v_writelane_b32 v40, [[FP_SCRATCH_COPY]], 4
; MUBUF: s_addk_i32 s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; FLATSCR: s_add_i32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; GCN: v_writelane_b32 v41, [[FP_SCRATCH_COPY]], 0

; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64

; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v41, 0		; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v40, 4
; MUBUF: buffer_load_dword v40		; MUBUF: buffer_load_dword v40
; MUBUF: buffer_load_dword v41
; FLATSCR: scratch_load_dword v40		; FLATSCR: scratch_load_dword v40
; FLATSCR: scratch_load_dword v41
; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]		; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
define void @test_func_call_external_void_funcx2() #0 {		define void @test_func_call_external_void_funcx2() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:		; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:
▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines

; GCN-LABEL: {{^}}callee_with_stack_and_call:		; GCN-LABEL: {{^}}callee_with_stack_and_call:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33		; GCN-NEXT: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR_1:v[0-9]+]], off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR_1:v[0-9]+]], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
		; GCN: v_writelane_b32 [[CSR_VGPR]], [[FP_SCRATCH_COPY]], 2
; MUBUF-DAG: s_addk_i32 s32, 0x400{{$}}		; MUBUF-DAG: s_addk_i32 s32, 0x400{{$}}
; FLATSCR-DAG: s_add_i32 s32, s32, 16{{$}}		; FLATSCR-DAG: s_add_i32 s32, s32, 16{{$}}
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}
; GCN: v_writelane_b32 [[CSR_VGPR_1]], [[FP_SCRATCH_COPY]], 0
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,

; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}		; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]
; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]

; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR_1]], 0		; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR_1]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR_1]], off, s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; MUBUF: s_addk_i32 s32, 0xfc00{{$}}		; MUBUF: s_addk_i32 s32, 0xfc00{{$}}
; FLATSCR: s_add_i32 s32, s32, -16{{$}}		; FLATSCR: s_add_i32 s32, s32, -16{{$}}
; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]		; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)

; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @callee_with_stack_and_call() #0 {		define void @callee_with_stack_and_call() #0 {
Show All 10 Lines
; spilling CSR SGPRs.		; spilling CSR SGPRs.

; GCN-LABEL: {{^}}callee_no_stack_with_call:		; GCN-LABEL: {{^}}callee_no_stack_with_call:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33		; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 ; 4-byte Folded Spill
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR_1:v[0-9]+]], off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR_1:v[0-9]+]], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; MUBUF-DAG: s_addk_i32 s32, 0x400		; MUBUF-DAG: s_addk_i32 s32, 0x400
; FLATSCR-DAG: s_add_i32 s32, s32, 16		; FLATSCR-DAG: s_add_i32 s32, s32, 16
; GCN-DAG: v_writelane_b32 [[CSR_VGPR_1]], [[FP_SCRATCH_COPY]], [[FP_SPILL_LANE:[0-9]+]]		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], [[FP_SCRATCH_COPY]], [[FP_SPILL_LANE:[0-9]+]]

; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0		; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0
; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1		; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1

; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR_1]], [[FP_SPILL_LANE]]		; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR]], [[FP_SPILL_LANE]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 ; 4-byte Folded Reload
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR_1]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR_1]], off, s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; MUBUF: s_addk_i32 s32, 0xfc00		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_i32 s32, s32, -16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]		; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @callee_no_stack_with_call() #0 {		define void @callee_no_stack_with_call() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
▲ Show 20 Lines • Show All 454 Lines • ▼ Show 20 Lines	define void @callee_need_to_spill_fp_to_memory_full_reserved_vgpr() #3 {
ret void		ret void
}		}

; When flat-scratch is enabled, we save the FP to s0. At the same time,		; When flat-scratch is enabled, we save the FP to s0. At the same time,
; the exec register is saved to s0 when saving CSR in the function prolog.		; the exec register is saved to s0 when saving CSR in the function prolog.
; Make sure that the FP save happens after restoring exec from the same		; Make sure that the FP save happens after restoring exec from the same
; register.		; register.
; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_reg:		; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_reg:
; FLATSCR: s_mov_b32 s2, s33		; FLATSCR: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; FLATSCR: s_mov_b32 s33, s32		; FLATSCR: s_mov_b32 s33, s32
; GCN-NOT: v_writelane_b32 v40, s33		; GCN-NOT: v_writelane_b32 v40, s33
; FLATSCR: s_or_saveexec_b64 s[4:5], -1		; FLATSCR: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; FLATSCR: s_mov_b64 exec, s[4:5]		; FLATSCR: s_mov_b64 exec, [[COPY_EXEC0]]
; FLATSCR: s_or_saveexec_b64 s[4:5], -1		; FLATSCR: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NOT: v_readlane_b32 s33, v40		; GCN-NOT: v_readlane_b32 s33, v40
; FLATSCR: s_mov_b32 s33, s2		; FLATSCR: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @callee_need_to_spill_fp_to_reg() #1 {		define void @callee_need_to_spill_fp_to_reg() #1 {
call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}		,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}		,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}		,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}		,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll

	Show All 24 Lines
	; GCN-NEXT: s_add_i32 s0, s0, 4			; GCN-NEXT: s_add_i32 s0, s0, 4
	; GCN-NEXT: s_mov_b64 vcc, vcc			; GCN-NEXT: s_mov_b64 vcc, vcc
	; GCN-NEXT: s_cbranch_vccnz .LBB0_2			; GCN-NEXT: s_cbranch_vccnz .LBB0_2
	; GCN-NEXT: .LBB0_3: ; %for.exit			; GCN-NEXT: .LBB0_3: ; %for.exit
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: test_loop:			; GCN_DBG-LABEL: test_loop:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s14, -1
				; GCN_DBG-NEXT: s_mov_b32 s15, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s12, s12, s11
				; GCN_DBG-NEXT: s_addc_u32 s13, s13, 0
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_load_dword s1, s[4:5], 0xa			; GCN_DBG-NEXT: s_load_dword s1, s[4:5], 0xa
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s2, -1			; GCN_DBG-NEXT: s_mov_b32 s2, -1
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: s_cmp_lg_u32 s1, s2			; GCN_DBG-NEXT: s_cmp_lg_u32 s1, s2
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_mov_b64 s[6:7], exec
				; GCN_DBG-NEXT: s_mov_b64 exec, -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_cbranch_scc1 .LBB0_2			; GCN_DBG-NEXT: s_cbranch_scc1 .LBB0_2
	; GCN_DBG-NEXT: ; %bb.1: ; %for.exit			; GCN_DBG-NEXT: ; %bb.1: ; %for.exit
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: ; kill: killed $vgpr0
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB0_2: ; %for.body			; GCN_DBG-NEXT: .LBB0_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1			; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB0_2			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB0_2
	; GCN_DBG-NEXT: ; %bb.3: ; %DummyReturnBlock			; GCN_DBG-NEXT: ; %bb.3: ; %DummyReturnBlock
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: ; kill: killed $vgpr0
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	entry:			entry:
	%cmp = icmp eq i32 %n, -1			%cmp = icmp eq i32 %n, -1
	br i1 %cmp, label %for.exit, label %for.body			br i1 %cmp, label %for.exit, label %for.body

	for.exit:			for.exit:
	ret void			ret void

	Show All 26 Lines
	; GCN-NEXT: s_add_i32 s0, s0, 4			; GCN-NEXT: s_add_i32 s0, s0, 4
	; GCN-NEXT: s_mov_b64 vcc, vcc			; GCN-NEXT: s_mov_b64 vcc, vcc
	; GCN-NEXT: s_cbranch_vccnz .LBB1_1			; GCN-NEXT: s_cbranch_vccnz .LBB1_1
	; GCN-NEXT: ; %bb.2: ; %DummyReturnBlock			; GCN-NEXT: ; %bb.2: ; %DummyReturnBlock
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_const_true:			; GCN_DBG-LABEL: loop_const_true:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s14, -1
				; GCN_DBG-NEXT: s_mov_b32 s15, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s12, s12, s11
				; GCN_DBG-NEXT: s_addc_u32 s13, s13, 0
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_branch .LBB1_2			; GCN_DBG-NEXT: s_branch .LBB1_2
	; GCN_DBG-NEXT: .LBB1_1: ; %for.exit			; GCN_DBG-NEXT: .LBB1_1: ; %for.exit
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: ; kill: killed $vgpr0
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB1_2: ; %for.body			; GCN_DBG-NEXT: .LBB1_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], 0			; GCN_DBG-NEXT: s_mov_b64 s[2:3], 0
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB1_1			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB1_1
	; GCN_DBG-NEXT: s_branch .LBB1_2			; GCN_DBG-NEXT: s_branch .LBB1_2
	entry:			entry:
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void

	Show All 18 Lines
	; GCN-NEXT: ds_read_b32 v1, v0 offset:128			; GCN-NEXT: ds_read_b32 v1, v0 offset:128
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: ds_write_b32 v0, v1 offset:128			; GCN-NEXT: ds_write_b32 v0, v1 offset:128
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_const_false:			; GCN_DBG-LABEL: loop_const_false:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s14, -1
				; GCN_DBG-NEXT: s_mov_b32 s15, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s12, s12, s11
				; GCN_DBG-NEXT: s_addc_u32 s13, s13, 0
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_branch .LBB2_2			; GCN_DBG-NEXT: s_branch .LBB2_2
	; GCN_DBG-NEXT: .LBB2_1: ; %for.exit			; GCN_DBG-NEXT: .LBB2_1: ; %for.exit
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: ; kill: killed $vgpr0
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB2_2: ; %for.body			; GCN_DBG-NEXT: .LBB2_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1			; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB2_1			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB2_1
	; GCN_DBG-NEXT: s_branch .LBB2_2			; GCN_DBG-NEXT: s_branch .LBB2_2
	entry:			entry:
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void

	Show All 19 Lines
	; GCN-NEXT: ds_read_b32 v1, v0 offset:128			; GCN-NEXT: ds_read_b32 v1, v0 offset:128
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: ds_write_b32 v0, v1 offset:128			; GCN-NEXT: ds_write_b32 v0, v1 offset:128
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_const_undef:			; GCN_DBG-LABEL: loop_const_undef:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s14, -1
				; GCN_DBG-NEXT: s_mov_b32 s15, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s12, s12, s11
				; GCN_DBG-NEXT: s_addc_u32 s13, s13, 0
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_branch .LBB3_2			; GCN_DBG-NEXT: s_branch .LBB3_2
	; GCN_DBG-NEXT: .LBB3_1: ; %for.exit			; GCN_DBG-NEXT: .LBB3_1: ; %for.exit
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: ; kill: killed $vgpr0
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB3_2: ; %for.body			; GCN_DBG-NEXT: .LBB3_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_1			; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_1
	; GCN_DBG-NEXT: s_branch .LBB3_2			; GCN_DBG-NEXT: s_branch .LBB3_2
	entry:			entry:
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void

	Show All 33 Lines
	; GCN-NEXT: s_add_i32 s0, s0, 4			; GCN-NEXT: s_add_i32 s0, s0, 4
	; GCN-NEXT: s_mov_b64 vcc, vcc			; GCN-NEXT: s_mov_b64 vcc, vcc
	; GCN-NEXT: s_cbranch_vccz .LBB4_1			; GCN-NEXT: s_cbranch_vccz .LBB4_1
	; GCN-NEXT: ; %bb.2: ; %for.exit			; GCN-NEXT: ; %bb.2: ; %for.exit
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_arg_0:			; GCN_DBG-LABEL: loop_arg_0:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s14, -1
				; GCN_DBG-NEXT: s_mov_b32 s15, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s12, s12, s11
				; GCN_DBG-NEXT: s_addc_u32 s13, s13, 0
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[4:5], 0x9
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, 0			; GCN_DBG-NEXT: v_mov_b32_e32 v1, 0
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: ds_read_u8 v1, v1			; GCN_DBG-NEXT: ds_read_u8 v1, v1
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_readfirstlane_b32 s0, v1			; GCN_DBG-NEXT: v_readfirstlane_b32 s0, v1
	; GCN_DBG-NEXT: s_and_b32 s0, 1, s0			; GCN_DBG-NEXT: s_and_b32 s0, 1, s0
	; GCN_DBG-NEXT: s_cmp_eq_u32 s0, 1			; GCN_DBG-NEXT: s_cmp_eq_u32 s0, 1
	; GCN_DBG-NEXT: s_cselect_b64 s[0:1], -1, 0			; GCN_DBG-NEXT: s_cselect_b64 s[0:1], -1, 0
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1			; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1
	; GCN_DBG-NEXT: s_xor_b64 s[0:1], s[0:1], s[2:3]			; GCN_DBG-NEXT: s_xor_b64 s[0:1], s[0:1], s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
	; GCN_DBG-NEXT: v_writelane_b32 v0, s1, 2			; GCN_DBG-NEXT: v_writelane_b32 v0, s1, 2
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_branch .LBB4_2			; GCN_DBG-NEXT: s_branch .LBB4_2
	; GCN_DBG-NEXT: .LBB4_1: ; %for.exit			; GCN_DBG-NEXT: .LBB4_1: ; %for.exit
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: ; kill: killed $vgpr0
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB4_2: ; %for.body			; GCN_DBG-NEXT: .LBB4_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 3			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 3
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s3, v0, 2			; GCN_DBG-NEXT: v_readlane_b32 s3, v0, 2
	; GCN_DBG-NEXT: v_readlane_b32 s4, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s4, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s4			; GCN_DBG-NEXT: s_add_i32 s1, s1, s4
	; GCN_DBG-NEXT: s_mov_b32 s4, 0x80			; GCN_DBG-NEXT: s_mov_b32 s4, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s4			; GCN_DBG-NEXT: s_add_i32 s1, s1, s4
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s4, 1.0			; GCN_DBG-NEXT: s_mov_b32 s4, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s4			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s4
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB4_1			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB4_1
	; GCN_DBG-NEXT: s_branch .LBB4_2			; GCN_DBG-NEXT: s_branch .LBB4_2
	entry:			entry:
	%cond = load volatile i1, ptr addrspace(3) null			%cond = load volatile i1, ptr addrspace(3) null
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void
	Show All 11 Lines

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GCN-O0-LABEL: simple_nested_if:			; GCN-O0-LABEL: simple_nested_if:
	; GCN-O0: ; %bb.0: ; %bb			; GCN-O0: ; %bb.0: ; %bb
	; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0			; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1			; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GCN-O0-NEXT: s_mov_b32 s14, -1			; GCN-O0-NEXT: s_mov_b32 s14, -1
	; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000			; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000
	; GCN-O0-NEXT: s_add_u32 s12, s12, s11			; GCN-O0-NEXT: s_add_u32 s12, s12, s11
	; GCN-O0-NEXT: s_addc_u32 s13, s13, 0			; GCN-O0-NEXT: s_addc_u32 s13, s13, 0
				; GCN-O0-NEXT: ; implicit-def: $vgpr1
				; GCN-O0-NEXT: v_mov_b32_e32 v1, v0
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9			; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9
	; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 0			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 0
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 1			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:8 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b32 s0, 1			; GCN-O0-NEXT: s_mov_b32 s0, 1
	; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 2			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 3			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 3
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB0_4			; GCN-O0-NEXT: s_cbranch_execz .LBB0_4
	; GCN-O0-NEXT: ; %bb.1: ; %bb.outer.then			; GCN-O0-NEXT: ; %bb.1: ; %bb.outer.then
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s4, v1, 0			; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: v_readlane_b32 s5, v1, 1			; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_waitcnt vmcnt(1)
				; GCN-O0-NEXT: v_readlane_b32 s4, v0, 0
				; GCN-O0-NEXT: v_readlane_b32 s5, v0, 1
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s0, 0			; GCN-O0-NEXT: s_mov_b32 s0, 0
	; GCN-O0-NEXT: ; kill: def $sgpr0 killed $sgpr0 def $sgpr0_sgpr1			; GCN-O0-NEXT: ; kill: def $sgpr0 killed $sgpr0 def $sgpr0_sgpr1
	; GCN-O0-NEXT: s_mov_b32 s1, s2			; GCN-O0-NEXT: s_mov_b32 s1, s2
	; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7			; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7
	; GCN-O0-NEXT: s_mov_b64 s[6:7], s[0:1]			; GCN-O0-NEXT: s_mov_b64 s[6:7], s[0:1]
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v0			; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_mov_b32_e32 v3, v4
	; GCN-O0-NEXT: s_mov_b32 s0, 2			; GCN-O0-NEXT: s_mov_b32 s0, 2
	; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s0			; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s0
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 0			; GCN-O0-NEXT: v_mov_b32_e32 v2, 0
	; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64
	; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 4			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 4
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 5			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 5
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB0_3			; GCN-O0-NEXT: s_cbranch_execz .LBB0_3
	; GCN-O0-NEXT: ; %bb.2: ; %bb.inner.then			; GCN-O0-NEXT: ; %bb.2: ; %bb.inner.then
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0			; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 1
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_add_i32_e64 v2, s[2:3], v2, v0			; GCN-O0-NEXT: v_readlane_b32 s0, v0, 0
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v2			; GCN-O0-NEXT: v_readlane_b32 s1, v0, 1
	; GCN-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; GCN-O0-NEXT: v_mov_b32_e32 v0, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_add_i32_e64 v1, s[2:3], v1, v0
				; GCN-O0-NEXT: v_ashrrev_i32_e64 v3, 31, v1
				; GCN-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
				; GCN-O0-NEXT: v_mov_b32_e32 v2, v3
	; GCN-O0-NEXT: s_mov_b32 s2, 2			; GCN-O0-NEXT: s_mov_b32 s2, 2
	; GCN-O0-NEXT: v_lshl_b64 v[2:3], v[2:3], s2			; GCN-O0-NEXT: v_lshl_b64 v[1:2], v[1:2], s2
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5
	; GCN-O0-NEXT: s_mov_b32 s5, s2			; GCN-O0-NEXT: s_mov_b32 s5, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64
	; GCN-O0-NEXT: .LBB0_3: ; %Flow			; GCN-O0-NEXT: .LBB0_3: ; %Flow
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 4			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 5			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 4
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 5
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: .LBB0_4: ; %bb.outer.end			; GCN-O0-NEXT: .LBB0_4: ; %bb.outer.end
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 2			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 3
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 2
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 3
				; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 3			; GCN-O0-NEXT: v_mov_b32_e32 v2, 3
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 0			; GCN-O0-NEXT: v_mov_b32_e32 v1, 0
	; GCN-O0-NEXT: s_mov_b32 m0, -1			; GCN-O0-NEXT: s_mov_b32 m0, -1
	; GCN-O0-NEXT: ds_write_b32 v0, v2			; GCN-O0-NEXT: ds_write_b32 v1, v2
				; GCN-O0-NEXT: ; kill: killed $vgpr0
	; GCN-O0-NEXT: s_endpgm			; GCN-O0-NEXT: s_endpgm
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp1 = icmp ugt i32 %tmp, 1			%tmp1 = icmp ugt i32 %tmp, 1
	br i1 %tmp1, label %bb.outer.then, label %bb.outer.end			br i1 %tmp1, label %bb.outer.then, label %bb.outer.end

	bb.outer.then: ; preds = %bb			bb.outer.then: ; preds = %bb
	%tmp4 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp			%tmp4 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GCN-O0-LABEL: uncollapsable_nested_if:			; GCN-O0-LABEL: uncollapsable_nested_if:
	; GCN-O0: ; %bb.0: ; %bb			; GCN-O0: ; %bb.0: ; %bb
	; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0			; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1			; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GCN-O0-NEXT: s_mov_b32 s14, -1			; GCN-O0-NEXT: s_mov_b32 s14, -1
	; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000			; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000
	; GCN-O0-NEXT: s_add_u32 s12, s12, s11			; GCN-O0-NEXT: s_add_u32 s12, s12, s11
	; GCN-O0-NEXT: s_addc_u32 s13, s13, 0			; GCN-O0-NEXT: s_addc_u32 s13, s13, 0
				; GCN-O0-NEXT: ; implicit-def: $vgpr1
				; GCN-O0-NEXT: v_mov_b32_e32 v1, v0
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9			; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9
	; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 0			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 0
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 1			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:8 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b32 s0, 1			; GCN-O0-NEXT: s_mov_b32 s0, 1
	; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 2			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 3			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 3
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB1_3			; GCN-O0-NEXT: s_cbranch_execz .LBB1_3
	; GCN-O0-NEXT: ; %bb.1: ; %bb.outer.then			; GCN-O0-NEXT: ; %bb.1: ; %bb.outer.then
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s4, v1, 0			; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: v_readlane_b32 s5, v1, 1			; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_waitcnt vmcnt(1)
				; GCN-O0-NEXT: v_readlane_b32 s4, v0, 0
				; GCN-O0-NEXT: v_readlane_b32 s5, v0, 1
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s0, 0			; GCN-O0-NEXT: s_mov_b32 s0, 0
	; GCN-O0-NEXT: ; kill: def $sgpr0 killed $sgpr0 def $sgpr0_sgpr1			; GCN-O0-NEXT: ; kill: def $sgpr0 killed $sgpr0 def $sgpr0_sgpr1
	; GCN-O0-NEXT: s_mov_b32 s1, s2			; GCN-O0-NEXT: s_mov_b32 s1, s2
	; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7			; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7
	; GCN-O0-NEXT: s_mov_b64 s[6:7], s[0:1]			; GCN-O0-NEXT: s_mov_b64 s[6:7], s[0:1]
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v0			; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_mov_b32_e32 v3, v4
	; GCN-O0-NEXT: s_mov_b32 s0, 2			; GCN-O0-NEXT: s_mov_b32 s0, 2
	; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s0			; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s0
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 0			; GCN-O0-NEXT: v_mov_b32_e32 v2, 0
	; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64
	; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 4			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 4
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 5			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 5
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB1_4			; GCN-O0-NEXT: s_cbranch_execz .LBB1_4
	; GCN-O0-NEXT: ; %bb.2: ; %bb.inner.then			; GCN-O0-NEXT: ; %bb.2: ; %bb.inner.then
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0			; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 1
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_add_i32_e64 v2, s[2:3], v2, v0			; GCN-O0-NEXT: v_readlane_b32 s0, v0, 0
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v2			; GCN-O0-NEXT: v_readlane_b32 s1, v0, 1
	; GCN-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; GCN-O0-NEXT: v_mov_b32_e32 v0, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_add_i32_e64 v1, s[2:3], v1, v0
				; GCN-O0-NEXT: v_ashrrev_i32_e64 v3, 31, v1
				; GCN-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
				; GCN-O0-NEXT: v_mov_b32_e32 v2, v3
	; GCN-O0-NEXT: s_mov_b32 s2, 2			; GCN-O0-NEXT: s_mov_b32 s2, 2
	; GCN-O0-NEXT: v_lshl_b64 v[2:3], v[2:3], s2			; GCN-O0-NEXT: v_lshl_b64 v[1:2], v[1:2], s2
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5
	; GCN-O0-NEXT: s_mov_b32 s5, s2			; GCN-O0-NEXT: s_mov_b32 s5, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64
	; GCN-O0-NEXT: s_branch .LBB1_4			; GCN-O0-NEXT: s_branch .LBB1_4
	; GCN-O0-NEXT: .LBB1_3: ; %Flow			; GCN-O0-NEXT: .LBB1_3: ; %Flow
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 2			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 3			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 2
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 3
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_branch .LBB1_5			; GCN-O0-NEXT: s_branch .LBB1_5
	; GCN-O0-NEXT: .LBB1_4: ; %bb.inner.end			; GCN-O0-NEXT: .LBB1_4: ; %bb.inner.end
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s2, v1, 4			; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: v_readlane_b32 s3, v1, 5			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s2, v0, 4
				; GCN-O0-NEXT: v_readlane_b32 s3, v0, 5
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[2:3]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[2:3]
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0			; GCN-O0-NEXT: v_readlane_b32 s0, v0, 0
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1			; GCN-O0-NEXT: v_readlane_b32 s1, v0, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 2			; GCN-O0-NEXT: v_mov_b32_e32 v0, 2
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: v_add_i32_e64 v1, s[2:3], v1, v0
	; GCN-O0-NEXT: v_add_i32_e64 v2, s[2:3], v2, v0			; GCN-O0-NEXT: v_ashrrev_i32_e64 v3, 31, v1
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v2			; GCN-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
	; GCN-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; GCN-O0-NEXT: v_mov_b32_e32 v2, v3
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_lshl_b64 v[1:2], v[1:2], v0
	; GCN-O0-NEXT: v_lshl_b64 v[2:3], v[2:3], v0
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5
	; GCN-O0-NEXT: s_mov_b32 s5, s2			; GCN-O0-NEXT: s_mov_b32 s5, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64
	; GCN-O0-NEXT: s_branch .LBB1_3			; GCN-O0-NEXT: s_branch .LBB1_3
	; GCN-O0-NEXT: .LBB1_5: ; %bb.outer.end			; GCN-O0-NEXT: .LBB1_5: ; %bb.outer.end
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 3			; GCN-O0-NEXT: v_mov_b32_e32 v2, 3
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 0			; GCN-O0-NEXT: v_mov_b32_e32 v1, 0
	; GCN-O0-NEXT: s_mov_b32 m0, -1			; GCN-O0-NEXT: s_mov_b32 m0, -1
	; GCN-O0-NEXT: ds_write_b32 v0, v2			; GCN-O0-NEXT: ds_write_b32 v1, v2
				; GCN-O0-NEXT: ; kill: killed $vgpr0
	; GCN-O0-NEXT: s_endpgm			; GCN-O0-NEXT: s_endpgm
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp1 = icmp ugt i32 %tmp, 1			%tmp1 = icmp ugt i32 %tmp, 1
	br i1 %tmp1, label %bb.outer.then, label %bb.outer.end			br i1 %tmp1, label %bb.outer.then, label %bb.outer.end

	bb.outer.then: ; preds = %bb			bb.outer.then: ; preds = %bb
	%tmp4 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp			%tmp4 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; GCN-O0-LABEL: nested_if_if_else:			; GCN-O0-LABEL: nested_if_if_else:
	; GCN-O0: ; %bb.0: ; %bb			; GCN-O0: ; %bb.0: ; %bb
	; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0			; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1			; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GCN-O0-NEXT: s_mov_b32 s14, -1			; GCN-O0-NEXT: s_mov_b32 s14, -1
	; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000			; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000
	; GCN-O0-NEXT: s_add_u32 s12, s12, s11			; GCN-O0-NEXT: s_add_u32 s12, s12, s11
	; GCN-O0-NEXT: s_addc_u32 s13, s13, 0			; GCN-O0-NEXT: s_addc_u32 s13, s13, 0
				; GCN-O0-NEXT: ; implicit-def: $vgpr1
				; GCN-O0-NEXT: v_mov_b32_e32 v1, v0
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9			; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9
	; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)			; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[0:1]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[0:1]
	; GCN-O0-NEXT: v_writelane_b32 v1, s2, 0			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_writelane_b32 v1, s3, 1			; GCN-O0-NEXT: v_writelane_b32 v0, s2, 0
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0			; GCN-O0-NEXT: v_writelane_b32 v0, s3, 1
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
				; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:8 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5
	; GCN-O0-NEXT: s_mov_b32 s5, s2			; GCN-O0-NEXT: s_mov_b32 s5, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; implicit-def: $sgpr4			; GCN-O0-NEXT: ; implicit-def: $sgpr4
	; GCN-O0-NEXT: v_mov_b32_e32 v4, 0			; GCN-O0-NEXT: v_mov_b32_e32 v4, 0
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_mov_b32_e32 v3, v4
	; GCN-O0-NEXT: s_mov_b32 s4, 2			; GCN-O0-NEXT: s_mov_b32 s4, 2
	; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s4			; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s4
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 0			; GCN-O0-NEXT: v_mov_b32_e32 v2, 0
	; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[0:3], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[0:3], 0 addr64
	; GCN-O0-NEXT: s_mov_b32 s0, 1			; GCN-O0-NEXT: s_mov_b32 s0, 1
	; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 2			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 3			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 3
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB2_6			; GCN-O0-NEXT: s_cbranch_execz .LBB2_6
	; GCN-O0-NEXT: ; %bb.1: ; %bb.outer.then			; GCN-O0-NEXT: ; %bb.1: ; %bb.outer.then
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b32 s0, 2			; GCN-O0-NEXT: s_mov_b32 s0, 2
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[0:1], v0, s0			; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[0:1], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[2:3], exec			; GCN-O0-NEXT: s_mov_b64 s[2:3], exec
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[2:3], s[0:1]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[2:3], s[0:1]
	; GCN-O0-NEXT: s_xor_b64 s[2:3], s[0:1], s[2:3]			; GCN-O0-NEXT: s_xor_b64 s[2:3], s[0:1], s[2:3]
	; GCN-O0-NEXT: v_writelane_b32 v1, s2, 4			; GCN-O0-NEXT: v_writelane_b32 v0, s2, 4
	; GCN-O0-NEXT: v_writelane_b32 v1, s3, 5			; GCN-O0-NEXT: v_writelane_b32 v0, s3, 5
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB2_2			; GCN-O0-NEXT: s_cbranch_execz .LBB2_2
	; GCN-O0-NEXT: s_branch .LBB2_4			; GCN-O0-NEXT: s_branch .LBB2_4
	; GCN-O0-NEXT: .LBB2_2: ; %Flow			; GCN-O0-NEXT: .LBB2_2: ; %Flow
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 4			; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 5			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 4
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 5
	; GCN-O0-NEXT: s_or_saveexec_b64 s[0:1], s[0:1]			; GCN-O0-NEXT: s_or_saveexec_b64 s[0:1], s[0:1]
	; GCN-O0-NEXT: s_and_b64 s[0:1], exec, s[0:1]			; GCN-O0-NEXT: s_and_b64 s[0:1], exec, s[0:1]
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 6			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 6
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 7			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 7
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: s_xor_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_xor_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB2_5			; GCN-O0-NEXT: s_cbranch_execz .LBB2_5
	; GCN-O0-NEXT: ; %bb.3: ; %bb.then			; GCN-O0-NEXT: ; %bb.3: ; %bb.then
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0			; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 1
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_add_i32_e64 v2, s[2:3], v2, v0			; GCN-O0-NEXT: v_readlane_b32 s0, v0, 0
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v2			; GCN-O0-NEXT: v_readlane_b32 s1, v0, 1
	; GCN-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; GCN-O0-NEXT: v_mov_b32_e32 v0, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_add_i32_e64 v1, s[2:3], v1, v0
				; GCN-O0-NEXT: v_ashrrev_i32_e64 v3, 31, v1
				; GCN-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
				; GCN-O0-NEXT: v_mov_b32_e32 v2, v3
	; GCN-O0-NEXT: s_mov_b32 s2, 2			; GCN-O0-NEXT: s_mov_b32 s2, 2
	; GCN-O0-NEXT: v_lshl_b64 v[2:3], v[2:3], s2			; GCN-O0-NEXT: v_lshl_b64 v[1:2], v[1:2], s2
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5
	; GCN-O0-NEXT: s_mov_b32 s5, s2			; GCN-O0-NEXT: s_mov_b32 s5, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64
	; GCN-O0-NEXT: s_branch .LBB2_5			; GCN-O0-NEXT: s_branch .LBB2_5
	; GCN-O0-NEXT: .LBB2_4: ; %bb.else			; GCN-O0-NEXT: .LBB2_4: ; %bb.else
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0			; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 2
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_add_i32_e64 v2, s[2:3], v2, v0			; GCN-O0-NEXT: v_readlane_b32 s0, v0, 0
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v4, 31, v2			; GCN-O0-NEXT: v_readlane_b32 s1, v0, 1
	; GCN-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; GCN-O0-NEXT: v_mov_b32_e32 v0, 2
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_add_i32_e64 v1, s[2:3], v1, v0
	; GCN-O0-NEXT: v_lshl_b64 v[2:3], v[2:3], v0			; GCN-O0-NEXT: v_ashrrev_i32_e64 v3, 31, v1
				; GCN-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
				; GCN-O0-NEXT: v_mov_b32_e32 v2, v3
				; GCN-O0-NEXT: v_lshl_b64 v[1:2], v[1:2], v0
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5
	; GCN-O0-NEXT: s_mov_b32 s5, s2			; GCN-O0-NEXT: s_mov_b32 s5, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64
	; GCN-O0-NEXT: s_branch .LBB2_2			; GCN-O0-NEXT: s_branch .LBB2_2
	; GCN-O0-NEXT: .LBB2_5: ; %Flow1			; GCN-O0-NEXT: .LBB2_5: ; %Flow1
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 6			; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 7			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 6
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 7
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: .LBB2_6: ; %bb.outer.end			; GCN-O0-NEXT: .LBB2_6: ; %bb.outer.end
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 2			; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 3
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 2
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 3
				; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 3			; GCN-O0-NEXT: v_mov_b32_e32 v2, 3
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 0			; GCN-O0-NEXT: v_mov_b32_e32 v1, 0
	; GCN-O0-NEXT: s_mov_b32 m0, -1			; GCN-O0-NEXT: s_mov_b32 m0, -1
	; GCN-O0-NEXT: ds_write_b32 v0, v2			; GCN-O0-NEXT: ds_write_b32 v1, v2
				; GCN-O0-NEXT: ; kill: killed $vgpr0
	; GCN-O0-NEXT: s_endpgm			; GCN-O0-NEXT: s_endpgm
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp1 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp			%tmp1 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp
	store i32 0, ptr addrspace(1) %tmp1, align 4			store i32 0, ptr addrspace(1) %tmp1, align 4
	%tmp2 = icmp ugt i32 %tmp, 1			%tmp2 = icmp ugt i32 %tmp, 1
	br i1 %tmp2, label %bb.outer.then, label %bb.outer.end			br i1 %tmp2, label %bb.outer.then, label %bb.outer.end

	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; GCN-O0-LABEL: nested_if_else_if:			; GCN-O0-LABEL: nested_if_else_if:
	; GCN-O0: ; %bb.0: ; %bb			; GCN-O0: ; %bb.0: ; %bb
	; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0			; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1			; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GCN-O0-NEXT: s_mov_b32 s14, -1			; GCN-O0-NEXT: s_mov_b32 s14, -1
	; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000			; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000
	; GCN-O0-NEXT: s_add_u32 s12, s12, s11			; GCN-O0-NEXT: s_add_u32 s12, s12, s11
	; GCN-O0-NEXT: s_addc_u32 s13, s13, 0			; GCN-O0-NEXT: s_addc_u32 s13, s13, 0
				; GCN-O0-NEXT: ; implicit-def: $vgpr1
				; GCN-O0-NEXT: v_mov_b32_e32 v1, v0
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x9			; GCN-O0-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x9
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:12 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:16 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b32 s0, 0			; GCN-O0-NEXT: s_mov_b32 s0, 0
	; GCN-O0-NEXT: ; implicit-def: $sgpr0			; GCN-O0-NEXT: ; implicit-def: $sgpr0
	; GCN-O0-NEXT: v_mov_b32_e32 v4, 0			; GCN-O0-NEXT: v_mov_b32_e32 v4, 0
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v4			; GCN-O0-NEXT: v_mov_b32_e32 v3, v4
	; GCN-O0-NEXT: s_mov_b32 s0, 2			; GCN-O0-NEXT: s_mov_b32 s0, 2
	; GCN-O0-NEXT: s_mov_b32 s1, s0			; GCN-O0-NEXT: s_mov_b32 s1, s0
	; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s1			; GCN-O0-NEXT: v_lshl_b64 v[3:4], v[2:3], s1
	; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)			; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-O0-NEXT: s_mov_b32 s2, s4			; GCN-O0-NEXT: s_mov_b32 s2, s4
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v3			; GCN-O0-NEXT: v_mov_b32_e32 v2, v3
	; GCN-O0-NEXT: s_mov_b32 s1, s5			; GCN-O0-NEXT: s_mov_b32 s1, s5
	; GCN-O0-NEXT: v_mov_b32_e32 v6, v4			; GCN-O0-NEXT: v_mov_b32_e32 v6, v4
	; GCN-O0-NEXT: v_add_i32_e64 v5, s[2:3], s2, v2			; GCN-O0-NEXT: v_add_i32_e64 v5, s[2:3], s2, v2
	; GCN-O0-NEXT: v_mov_b32_e32 v2, s1			; GCN-O0-NEXT: v_mov_b32_e32 v2, s1
	; GCN-O0-NEXT: v_addc_u32_e64 v2, s[2:3], v2, v6, s[2:3]			; GCN-O0-NEXT: v_addc_u32_e64 v2, s[2:3], v2, v6, s[2:3]
	; GCN-O0-NEXT: ; kill: def $vgpr5 killed $vgpr5 def $vgpr5_vgpr6 killed $exec			; GCN-O0-NEXT: ; kill: def $vgpr5 killed $vgpr5 def $vgpr5_vgpr6 killed $exec
	; GCN-O0-NEXT: v_mov_b32_e32 v6, v2			; GCN-O0-NEXT: v_mov_b32_e32 v6, v2
	; GCN-O0-NEXT: buffer_store_dword v5, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v5, off, s[12:15], 0 offset:8 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v6, off, s[12:15], 0 offset:8 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v6, off, s[12:15], 0 offset:12 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b32 s1, 0xf000			; GCN-O0-NEXT: s_mov_b32 s1, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s2, 0			; GCN-O0-NEXT: s_mov_b32 s2, 0
	; GCN-O0-NEXT: ; kill: def $sgpr2 killed $sgpr2 def $sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr2 killed $sgpr2 def $sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b32 s3, s1			; GCN-O0-NEXT: s_mov_b32 s3, s1
	; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7			; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7
	; GCN-O0-NEXT: s_mov_b64 s[6:7], s[2:3]			; GCN-O0-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 0			; GCN-O0-NEXT: v_mov_b32_e32 v2, 0
	; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64
	; GCN-O0-NEXT: v_cmp_lt_u32_e64 s[0:1], v0, s0			; GCN-O0-NEXT: v_cmp_lt_u32_e64 s[0:1], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[2:3], exec			; GCN-O0-NEXT: s_mov_b64 s[2:3], exec
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[2:3], s[0:1]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[2:3], s[0:1]
	; GCN-O0-NEXT: s_xor_b64 s[2:3], s[0:1], s[2:3]			; GCN-O0-NEXT: s_xor_b64 s[2:3], s[0:1], s[2:3]
	; GCN-O0-NEXT: v_writelane_b32 v1, s2, 0			; GCN-O0-NEXT: v_writelane_b32 v0, s2, 0
	; GCN-O0-NEXT: v_writelane_b32 v1, s3, 1			; GCN-O0-NEXT: v_writelane_b32 v0, s3, 1
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB3_1			; GCN-O0-NEXT: s_cbranch_execz .LBB3_1
	; GCN-O0-NEXT: s_branch .LBB3_4			; GCN-O0-NEXT: s_branch .LBB3_4
	; GCN-O0-NEXT: .LBB3_1: ; %Flow2			; GCN-O0-NEXT: .LBB3_1: ; %Flow2
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 0
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 1
	; GCN-O0-NEXT: s_or_saveexec_b64 s[0:1], s[0:1]			; GCN-O0-NEXT: s_or_saveexec_b64 s[0:1], s[0:1]
	; GCN-O0-NEXT: s_and_b64 s[0:1], exec, s[0:1]			; GCN-O0-NEXT: s_and_b64 s[0:1], exec, s[0:1]
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 2			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 3			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 3
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_xor_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_xor_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB3_8			; GCN-O0-NEXT: s_cbranch_execz .LBB3_8
	; GCN-O0-NEXT: ; %bb.2: ; %bb.outer.then			; GCN-O0-NEXT: ; %bb.2: ; %bb.outer.then
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: buffer_load_dword v4, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:16 ; 4-byte Folded Reload
				; GCN-O0-NEXT: buffer_load_dword v3, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: buffer_load_dword v4, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b32 s0, 0xf000			; GCN-O0-NEXT: s_mov_b32 s0, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s2, 0			; GCN-O0-NEXT: s_mov_b32 s2, 0
	; GCN-O0-NEXT: s_mov_b32 s4, s2			; GCN-O0-NEXT: s_mov_b32 s4, s2
	; GCN-O0-NEXT: s_mov_b32 s5, s0			; GCN-O0-NEXT: s_mov_b32 s5, s0
	; GCN-O0-NEXT: s_mov_b32 s0, s2			; GCN-O0-NEXT: s_mov_b32 s0, s2
	; GCN-O0-NEXT: s_mov_b32 s1, s2			; GCN-O0-NEXT: s_mov_b32 s1, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 1			; GCN-O0-NEXT: v_mov_b32_e32 v2, 1
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[0:3], 0 addr64 offset:4			; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[0:3], 0 addr64 offset:4
	; GCN-O0-NEXT: s_mov_b32 s0, 2			; GCN-O0-NEXT: s_mov_b32 s0, 2
	; GCN-O0-NEXT: v_cmp_eq_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_eq_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 4			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 4
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 5			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 5
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB3_7			; GCN-O0-NEXT: s_cbranch_execz .LBB3_7
	; GCN-O0-NEXT: ; %bb.3: ; %bb.inner.then			; GCN-O0-NEXT: ; %bb.3: ; %bb.inner.then
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(1)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b32 s0, 0xf000			; GCN-O0-NEXT: s_mov_b32 s0, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s2, 0			; GCN-O0-NEXT: s_mov_b32 s2, 0
	; GCN-O0-NEXT: s_mov_b32 s4, s2			; GCN-O0-NEXT: s_mov_b32 s4, s2
	; GCN-O0-NEXT: s_mov_b32 s5, s0			; GCN-O0-NEXT: s_mov_b32 s5, s0
	; GCN-O0-NEXT: s_mov_b32 s0, s2			; GCN-O0-NEXT: s_mov_b32 s0, s2
	; GCN-O0-NEXT: s_mov_b32 s1, s2			; GCN-O0-NEXT: s_mov_b32 s1, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 2			; GCN-O0-NEXT: v_mov_b32_e32 v0, 2
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64 offset:8			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64 offset:8
	; GCN-O0-NEXT: s_branch .LBB3_7			; GCN-O0-NEXT: s_branch .LBB3_7
	; GCN-O0-NEXT: .LBB3_4: ; %bb.outer.else			; GCN-O0-NEXT: .LBB3_4: ; %bb.outer.else
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v4, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:16 ; 4-byte Folded Reload
				; GCN-O0-NEXT: buffer_load_dword v3, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: buffer_load_dword v4, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b32 s1, 0xf000			; GCN-O0-NEXT: s_mov_b32 s1, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s0, 0			; GCN-O0-NEXT: s_mov_b32 s0, 0
	; GCN-O0-NEXT: s_mov_b32 s2, s0			; GCN-O0-NEXT: s_mov_b32 s2, s0
	; GCN-O0-NEXT: s_mov_b32 s3, s1			; GCN-O0-NEXT: s_mov_b32 s3, s1
	; GCN-O0-NEXT: s_mov_b32 s4, s0			; GCN-O0-NEXT: s_mov_b32 s4, s0
	; GCN-O0-NEXT: s_mov_b32 s5, s0			; GCN-O0-NEXT: s_mov_b32 s5, s0
	; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7			; GCN-O0-NEXT: ; kill: def $sgpr4_sgpr5 killed $sgpr4_sgpr5 def $sgpr4_sgpr5_sgpr6_sgpr7
	; GCN-O0-NEXT: s_mov_b64 s[6:7], s[2:3]			; GCN-O0-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 3			; GCN-O0-NEXT: v_mov_b32_e32 v2, 3
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64 offset:12			; GCN-O0-NEXT: buffer_store_dword v2, v[3:4], s[4:7], 0 addr64 offset:12
	; GCN-O0-NEXT: v_cmp_eq_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_eq_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 6			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 6
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 7			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 7
				; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB3_6			; GCN-O0-NEXT: s_cbranch_execz .LBB3_6
	; GCN-O0-NEXT: ; %bb.5: ; %bb.inner.then2			; GCN-O0-NEXT: ; %bb.5: ; %bb.inner.then2
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(1)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b32 s0, 0xf000			; GCN-O0-NEXT: s_mov_b32 s0, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s2, 0			; GCN-O0-NEXT: s_mov_b32 s2, 0
	; GCN-O0-NEXT: s_mov_b32 s4, s2			; GCN-O0-NEXT: s_mov_b32 s4, s2
	; GCN-O0-NEXT: s_mov_b32 s5, s0			; GCN-O0-NEXT: s_mov_b32 s5, s0
	; GCN-O0-NEXT: s_mov_b32 s0, s2			; GCN-O0-NEXT: s_mov_b32 s0, s2
	; GCN-O0-NEXT: s_mov_b32 s1, s2			; GCN-O0-NEXT: s_mov_b32 s1, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 4			; GCN-O0-NEXT: v_mov_b32_e32 v0, 4
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64 offset:16			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64 offset:16
	; GCN-O0-NEXT: .LBB3_6: ; %Flow			; GCN-O0-NEXT: .LBB3_6: ; %Flow
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 6			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 7			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 6
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 7
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_branch .LBB3_1			; GCN-O0-NEXT: s_branch .LBB3_1
	; GCN-O0-NEXT: .LBB3_7: ; %Flow1			; GCN-O0-NEXT: .LBB3_7: ; %Flow1
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 4			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 5			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 4
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 5
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: .LBB3_8: ; %bb.outer.end			; GCN-O0-NEXT: .LBB3_8: ; %bb.outer.end
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 2			; GCN-O0-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 3
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 2
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 3
				; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: v_mov_b32_e32 v2, 3			; GCN-O0-NEXT: v_mov_b32_e32 v2, 3
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 0			; GCN-O0-NEXT: v_mov_b32_e32 v1, 0
	; GCN-O0-NEXT: s_mov_b32 m0, -1			; GCN-O0-NEXT: s_mov_b32 m0, -1
	; GCN-O0-NEXT: ds_write_b32 v0, v2			; GCN-O0-NEXT: ds_write_b32 v1, v2
				; GCN-O0-NEXT: ; kill: killed $vgpr0
	; GCN-O0-NEXT: s_endpgm			; GCN-O0-NEXT: s_endpgm
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp1 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp			%tmp1 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp
	store i32 0, ptr addrspace(1) %tmp1, align 4			store i32 0, ptr addrspace(1) %tmp1, align 4
	%cc1 = icmp ugt i32 %tmp, 1			%cc1 = icmp ugt i32 %tmp, 1
	br i1 %cc1, label %bb.outer.then, label %bb.outer.else			br i1 %cc1, label %bb.outer.then, label %bb.outer.else

	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GCN-O0-LABEL: s_endpgm_unsafe_barrier:			; GCN-O0-LABEL: s_endpgm_unsafe_barrier:
	; GCN-O0: ; %bb.0: ; %bb			; GCN-O0: ; %bb.0: ; %bb
	; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0			; GCN-O0-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1			; GCN-O0-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GCN-O0-NEXT: s_mov_b32 s14, -1			; GCN-O0-NEXT: s_mov_b32 s14, -1
	; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000			; GCN-O0-NEXT: s_mov_b32 s15, 0xe8f000
	; GCN-O0-NEXT: s_add_u32 s12, s12, s11			; GCN-O0-NEXT: s_add_u32 s12, s12, s11
	; GCN-O0-NEXT: s_addc_u32 s13, s13, 0			; GCN-O0-NEXT: s_addc_u32 s13, s13, 0
				; GCN-O0-NEXT: ; implicit-def: $vgpr1
				; GCN-O0-NEXT: v_mov_b32_e32 v1, v0
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9			; GCN-O0-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x9
	; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 0			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 0
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 1			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 1
	; GCN-O0-NEXT: v_mov_b32_e32 v2, v0			; GCN-O0-NEXT: v_mov_b32_e32 v2, v1
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[12:15], 0 offset:8 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b32 s0, 1			; GCN-O0-NEXT: s_mov_b32 s0, 1
	; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v0, s0			; GCN-O0-NEXT: v_cmp_gt_u32_e64 s[2:3], v1, s0
	; GCN-O0-NEXT: s_mov_b64 s[0:1], exec			; GCN-O0-NEXT: s_mov_b64 s[0:1], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s0, 2			; GCN-O0-NEXT: v_writelane_b32 v0, s0, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s1, 3			; GCN-O0-NEXT: v_writelane_b32 v0, s1, 3
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]			; GCN-O0-NEXT: s_and_b64 s[0:1], s[0:1], s[2:3]
	; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]			; GCN-O0-NEXT: s_mov_b64 exec, s[0:1]
	; GCN-O0-NEXT: s_cbranch_execz .LBB4_2			; GCN-O0-NEXT: s_cbranch_execz .LBB4_2
	; GCN-O0-NEXT: ; %bb.1: ; %bb.then			; GCN-O0-NEXT: ; %bb.1: ; %bb.then
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0			; GCN-O0-NEXT: v_readlane_b32 s0, v1, 0
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1			; GCN-O0-NEXT: v_readlane_b32 s1, v1, 1
	; GCN-O0-NEXT: s_mov_b32 s2, 0xf000			; GCN-O0-NEXT: s_mov_b32 s2, 0xf000
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5
	; GCN-O0-NEXT: s_mov_b32 s5, s2			; GCN-O0-NEXT: s_mov_b32 s5, s2
	; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-O0-NEXT: ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[2:3], s[4:5]
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: v_ashrrev_i32_e64 v2, 31, v0
	; GCN-O0-NEXT: v_ashrrev_i32_e64 v0, 31, v2			; GCN-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
	; GCN-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; GCN-O0-NEXT: v_mov_b32_e32 v1, v2
	; GCN-O0-NEXT: v_mov_b32_e32 v3, v0
	; GCN-O0-NEXT: s_mov_b32 s4, 2			; GCN-O0-NEXT: s_mov_b32 s4, 2
	; GCN-O0-NEXT: v_lshl_b64 v[2:3], v[2:3], s4			; GCN-O0-NEXT: v_lshl_b64 v[1:2], v[0:1], s4
	; GCN-O0-NEXT: v_mov_b32_e32 v0, 0			; GCN-O0-NEXT: v_mov_b32_e32 v0, 0
	; GCN-O0-NEXT: buffer_store_dword v0, v[2:3], s[0:3], 0 addr64			; GCN-O0-NEXT: buffer_store_dword v0, v[1:2], s[0:3], 0 addr64
	; GCN-O0-NEXT: .LBB4_2: ; %bb.end			; GCN-O0-NEXT: .LBB4_2: ; %bb.end
	; GCN-O0-NEXT: v_readlane_b32 s0, v1, 2			; GCN-O0-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-O0-NEXT: v_readlane_b32 s1, v1, 3			; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s0, v0, 2
				; GCN-O0-NEXT: v_readlane_b32 s1, v0, 3
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[0:1]
	; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-O0-NEXT: s_barrier			; GCN-O0-NEXT: s_barrier
				; GCN-O0-NEXT: ; kill: killed $vgpr0
	; GCN-O0-NEXT: s_endpgm			; GCN-O0-NEXT: s_endpgm
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp1 = icmp ugt i32 %tmp, 1			%tmp1 = icmp ugt i32 %tmp, 1
	br i1 %tmp1, label %bb.then, label %bb.end			br i1 %tmp1, label %bb.then, label %bb.end

	bb.then: ; preds = %bb			bb.then: ; preds = %bb
	%tmp4 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp			%tmp4 = getelementptr inbounds i32, ptr addrspace(1) %arg, i32 %tmp
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen			; GCN-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GCN-O0-LABEL: scc_liveness:			; GCN-O0-LABEL: scc_liveness:
	; GCN-O0: ; %bb.0: ; %bb			; GCN-O0: ; %bb.0: ; %bb
	; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
				; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]			; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-O0-NEXT: ; implicit-def: $vgpr1
				; GCN-O0-NEXT: v_mov_b32_e32 v1, v0
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: s_waitcnt expcnt(1)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
				; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b64 s[4:5], 0			; GCN-O0-NEXT: s_mov_b64 s[4:5], 0
	; GCN-O0-NEXT: s_mov_b64 s[6:7], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[6:7], s[4:5]
	; GCN-O0-NEXT: s_waitcnt expcnt(1)			; GCN-O0-NEXT: s_waitcnt vmcnt(1)
	; GCN-O0-NEXT: v_writelane_b32 v1, s6, 0			; GCN-O0-NEXT: v_writelane_b32 v0, s6, 0
	; GCN-O0-NEXT: v_writelane_b32 v1, s7, 1			; GCN-O0-NEXT: v_writelane_b32 v0, s7, 1
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 2			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 3			; GCN-O0-NEXT: v_writelane_b32 v0, s5, 3
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: .LBB5_1: ; %bb1			; GCN-O0-NEXT: .LBB5_1: ; %bb1
	; GCN-O0-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-O0-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s8, v1, 2			; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: v_readlane_b32 s9, v1, 3			; GCN-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s6, v1, 0			; GCN-O0-NEXT: s_waitcnt vmcnt(1)
	; GCN-O0-NEXT: v_readlane_b32 s7, v1, 1			; GCN-O0-NEXT: v_readlane_b32 s8, v0, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s6, 4			; GCN-O0-NEXT: v_readlane_b32 s9, v0, 3
	; GCN-O0-NEXT: v_writelane_b32 v1, s7, 5			; GCN-O0-NEXT: v_readlane_b32 s6, v0, 0
				; GCN-O0-NEXT: v_readlane_b32 s7, v0, 1
				; GCN-O0-NEXT: v_writelane_b32 v0, s6, 4
				; GCN-O0-NEXT: v_writelane_b32 v0, s7, 5
	; GCN-O0-NEXT: s_mov_b32 s4, 0x207			; GCN-O0-NEXT: s_mov_b32 s4, 0x207
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_cmp_lt_i32_e64 s[4:5], v0, s4			; GCN-O0-NEXT: v_cmp_lt_i32_e64 s[4:5], v1, s4
	; GCN-O0-NEXT: s_or_b64 s[4:5], s[4:5], s[8:9]			; GCN-O0-NEXT: s_or_b64 s[4:5], s[4:5], s[8:9]
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 6			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 6
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 7			; GCN-O0-NEXT: v_writelane_b32 v0, s5, 7
	; GCN-O0-NEXT: v_writelane_b32 v1, s6, 0			; GCN-O0-NEXT: v_writelane_b32 v0, s6, 0
	; GCN-O0-NEXT: v_writelane_b32 v1, s7, 1			; GCN-O0-NEXT: v_writelane_b32 v0, s7, 1
	; GCN-O0-NEXT: s_mov_b64 s[6:7], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[6:7], s[4:5]
	; GCN-O0-NEXT: v_writelane_b32 v1, s6, 2			; GCN-O0-NEXT: v_writelane_b32 v0, s6, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s7, 3			; GCN-O0-NEXT: v_writelane_b32 v0, s7, 3
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[4:5]			; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[4:5]
	; GCN-O0-NEXT: s_cbranch_execnz .LBB5_1			; GCN-O0-NEXT: s_cbranch_execnz .LBB5_1
	; GCN-O0-NEXT: ; %bb.2: ; %bb2			; GCN-O0-NEXT: ; %bb.2: ; %bb2
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s4, v1, 6			; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: v_readlane_b32 s5, v1, 7			; GCN-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_waitcnt vmcnt(1)
				; GCN-O0-NEXT: v_readlane_b32 s4, v0, 6
				; GCN-O0-NEXT: v_readlane_b32 s5, v0, 7
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-O0-NEXT: s_mov_b32 s6, 0			; GCN-O0-NEXT: s_mov_b32 s6, 0
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[4:5], v0, s6			; GCN-O0-NEXT: v_cmp_ne_u32_e64 s[4:5], v1, s6
	; GCN-O0-NEXT: v_cmp_eq_u32_e64 s[6:7], v0, s6			; GCN-O0-NEXT: v_cmp_eq_u32_e64 s[6:7], v1, s6
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 8			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 8
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 9			; GCN-O0-NEXT: v_writelane_b32 v0, s5, 9
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: s_mov_b32 s8, s4			; GCN-O0-NEXT: s_mov_b32 s8, s4
	; GCN-O0-NEXT: s_mov_b32 s9, s4			; GCN-O0-NEXT: s_mov_b32 s9, s4
	; GCN-O0-NEXT: s_mov_b32 s10, s4			; GCN-O0-NEXT: s_mov_b32 s10, s4
	; GCN-O0-NEXT: s_mov_b32 s11, s4			; GCN-O0-NEXT: s_mov_b32 s11, s4
	; GCN-O0-NEXT: v_mov_b32_e32 v2, s8			; GCN-O0-NEXT: v_mov_b32_e32 v1, s8
	; GCN-O0-NEXT: v_mov_b32_e32 v3, s9			; GCN-O0-NEXT: v_mov_b32_e32 v2, s9
	; GCN-O0-NEXT: v_mov_b32_e32 v4, s10			; GCN-O0-NEXT: v_mov_b32_e32 v3, s10
	; GCN-O0-NEXT: v_mov_b32_e32 v5, s11			; GCN-O0-NEXT: v_mov_b32_e32 v4, s11
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b64 s[4:5], exec			; GCN-O0-NEXT: s_mov_b64 s[4:5], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 10			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 10
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 11			; GCN-O0-NEXT: v_writelane_b32 v0, s5, 11
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]			; GCN-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]
	; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]			; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-O0-NEXT: s_cbranch_execz .LBB5_5			; GCN-O0-NEXT: s_cbranch_execz .LBB5_5
	; GCN-O0-NEXT: ; %bb.3: ; %bb4			; GCN-O0-NEXT: ; %bb.3: ; %bb4
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: ; implicit-def: $sgpr4			; GCN-O0-NEXT: ; implicit-def: $sgpr4
	; GCN-O0-NEXT: v_mov_b32_e32 v0, s4			; GCN-O0-NEXT: v_mov_b32_e32 v1, s4
	; GCN-O0-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen			; GCN-O0-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen
	; GCN-O0-NEXT: s_mov_b32 s4, 0			; GCN-O0-NEXT: s_mov_b32 s4, 0
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_cmp_lt_f32_e64 s[6:7], v0, s4			; GCN-O0-NEXT: v_cmp_lt_f32_e64 s[6:7], v1, s4
	; GCN-O0-NEXT: s_mov_b32 s8, s4			; GCN-O0-NEXT: s_mov_b32 s8, s4
	; GCN-O0-NEXT: s_mov_b32 s9, s4			; GCN-O0-NEXT: s_mov_b32 s9, s4
	; GCN-O0-NEXT: s_mov_b32 s10, s4			; GCN-O0-NEXT: s_mov_b32 s10, s4
	; GCN-O0-NEXT: s_mov_b32 s11, s4			; GCN-O0-NEXT: s_mov_b32 s11, s4
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: v_mov_b32_e32 v1, s8
	; GCN-O0-NEXT: v_mov_b32_e32 v2, s8			; GCN-O0-NEXT: v_mov_b32_e32 v2, s9
	; GCN-O0-NEXT: v_mov_b32_e32 v3, s9			; GCN-O0-NEXT: v_mov_b32_e32 v3, s10
	; GCN-O0-NEXT: v_mov_b32_e32 v4, s10			; GCN-O0-NEXT: v_mov_b32_e32 v4, s11
	; GCN-O0-NEXT: v_mov_b32_e32 v5, s11			; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_mov_b64 s[4:5], exec			; GCN-O0-NEXT: s_mov_b64 s[4:5], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 12			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 12
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 13			; GCN-O0-NEXT: v_writelane_b32 v0, s5, 13
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]			; GCN-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]
	; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]			; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-O0-NEXT: s_cbranch_execz .LBB5_6			; GCN-O0-NEXT: s_cbranch_execz .LBB5_6
	; GCN-O0-NEXT: ; %bb.4: ; %bb8			; GCN-O0-NEXT: ; %bb.4: ; %bb8
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
	; GCN-O0-NEXT: s_mov_b32 s10, 0			; GCN-O0-NEXT: s_mov_b32 s10, 0
	; GCN-O0-NEXT: ; implicit-def: $sgpr4			; GCN-O0-NEXT: ; implicit-def: $sgpr4
	; GCN-O0-NEXT: ; implicit-def: $sgpr5			; GCN-O0-NEXT: ; implicit-def: $sgpr5
	; GCN-O0-NEXT: ; implicit-def: $sgpr9			; GCN-O0-NEXT: ; implicit-def: $sgpr9
	; GCN-O0-NEXT: ; implicit-def: $sgpr5			; GCN-O0-NEXT: ; implicit-def: $sgpr5
	; GCN-O0-NEXT: ; implicit-def: $sgpr8			; GCN-O0-NEXT: ; implicit-def: $sgpr8
	; GCN-O0-NEXT: ; implicit-def: $sgpr5			; GCN-O0-NEXT: ; implicit-def: $sgpr5
	; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5_sgpr6_sgpr7			; GCN-O0-NEXT: ; kill: def $sgpr4 killed $sgpr4 def $sgpr4_sgpr5_sgpr6_sgpr7
	; GCN-O0-NEXT: s_mov_b32 s5, s10			; GCN-O0-NEXT: s_mov_b32 s5, s10
	; GCN-O0-NEXT: s_mov_b32 s6, s9			; GCN-O0-NEXT: s_mov_b32 s6, s9
	; GCN-O0-NEXT: s_mov_b32 s7, s8			; GCN-O0-NEXT: s_mov_b32 s7, s8
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v2, s4			; GCN-O0-NEXT: v_mov_b32_e32 v0, s4
	; GCN-O0-NEXT: v_mov_b32_e32 v3, s5			; GCN-O0-NEXT: v_mov_b32_e32 v1, s5
	; GCN-O0-NEXT: v_mov_b32_e32 v4, s6			; GCN-O0-NEXT: v_mov_b32_e32 v2, s6
	; GCN-O0-NEXT: v_mov_b32_e32 v5, s7			; GCN-O0-NEXT: v_mov_b32_e32 v3, s7
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_branch .LBB5_6			; GCN-O0-NEXT: s_branch .LBB5_6
	; GCN-O0-NEXT: .LBB5_5: ; %Flow2			; GCN-O0-NEXT: .LBB5_5: ; %Flow2
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s4, v1, 10			; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GCN-O0-NEXT: v_readlane_b32 s5, v1, 11			; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; GCN-O0-NEXT: v_readlane_b32 s4, v4, 10
				; GCN-O0-NEXT: v_readlane_b32 s5, v4, 11
				; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_branch .LBB5_7			; GCN-O0-NEXT: s_branch .LBB5_7
	; GCN-O0-NEXT: .LBB5_6: ; %Flow			; GCN-O0-NEXT: .LBB5_6: ; %Flow
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s4, v1, 12			; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GCN-O0-NEXT: v_readlane_b32 s5, v1, 13			; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-O0-NEXT: v_readlane_b32 s4, v4, 12
				; GCN-O0-NEXT: v_readlane_b32 s5, v4, 13
				; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_branch .LBB5_5			; GCN-O0-NEXT: s_branch .LBB5_5
	; GCN-O0-NEXT: .LBB5_7: ; %bb10			; GCN-O0-NEXT: .LBB5_7: ; %bb10
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
	; GCN-O0-NEXT: v_readlane_b32 s6, v1, 8			; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GCN-O0-NEXT: v_readlane_b32 s7, v1, 9			; GCN-O0-NEXT: s_waitcnt expcnt(3)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s6, v0, 8
				; GCN-O0-NEXT: v_readlane_b32 s7, v0, 9
	; GCN-O0-NEXT: s_mov_b64 s[4:5], -1			; GCN-O0-NEXT: s_mov_b64 s[4:5], -1
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 14			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 14
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 15			; GCN-O0-NEXT: v_writelane_b32 v0, s5, 15
	; GCN-O0-NEXT: s_mov_b64 s[4:5], exec			; GCN-O0-NEXT: s_mov_b64 s[4:5], exec
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 16			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 16
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 17			; GCN-O0-NEXT: v_writelane_b32 v0, s5, 17
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]			; GCN-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]
	; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]			; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-O0-NEXT: s_cbranch_execz .LBB5_9			; GCN-O0-NEXT: s_cbranch_execz .LBB5_9
	; GCN-O0-NEXT: ; %bb.8: ; %Flow1			; GCN-O0-NEXT: ; %bb.8: ; %Flow1
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: s_waitcnt expcnt(0)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_mov_b64 s[4:5], 0			; GCN-O0-NEXT: s_mov_b64 s[4:5], 0
	; GCN-O0-NEXT: s_xor_b64 s[4:5], exec, -1			; GCN-O0-NEXT: s_xor_b64 s[4:5], exec, -1
	; GCN-O0-NEXT: v_writelane_b32 v1, s4, 14			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_writelane_b32 v1, s5, 15			; GCN-O0-NEXT: v_writelane_b32 v0, s4, 14
				; GCN-O0-NEXT: v_writelane_b32 v0, s5, 15
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: .LBB5_9: ; %Flow3			; GCN-O0-NEXT: .LBB5_9: ; %Flow3
	; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1			; GCN-O0-NEXT: ; in Loop: Header=BB5_1 Depth=1
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
	; GCN-O0-NEXT: v_readlane_b32 s8, v1, 16			; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GCN-O0-NEXT: v_readlane_b32 s9, v1, 17			; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s8, v4, 16
				; GCN-O0-NEXT: v_readlane_b32 s9, v4, 17
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[8:9]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[8:9]
	; GCN-O0-NEXT: v_readlane_b32 s6, v1, 4			; GCN-O0-NEXT: v_readlane_b32 s6, v4, 4
	; GCN-O0-NEXT: v_readlane_b32 s7, v1, 5			; GCN-O0-NEXT: v_readlane_b32 s7, v4, 5
	; GCN-O0-NEXT: v_readlane_b32 s4, v1, 14			; GCN-O0-NEXT: v_readlane_b32 s4, v4, 14
	; GCN-O0-NEXT: v_readlane_b32 s5, v1, 15			; GCN-O0-NEXT: v_readlane_b32 s5, v4, 15
	; GCN-O0-NEXT: s_and_b64 s[4:5], exec, s[4:5]			; GCN-O0-NEXT: s_and_b64 s[4:5], exec, s[4:5]
	; GCN-O0-NEXT: s_or_b64 s[4:5], s[4:5], s[6:7]			; GCN-O0-NEXT: s_or_b64 s[4:5], s[4:5], s[6:7]
	; GCN-O0-NEXT: s_mov_b64 s[6:7], 0			; GCN-O0-NEXT: s_mov_b64 s[6:7], 0
	; GCN-O0-NEXT: s_mov_b64 s[8:9], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[8:9], s[4:5]
	; GCN-O0-NEXT: v_writelane_b32 v1, s8, 0			; GCN-O0-NEXT: v_writelane_b32 v4, s8, 0
	; GCN-O0-NEXT: v_writelane_b32 v1, s9, 1			; GCN-O0-NEXT: v_writelane_b32 v4, s9, 1
	; GCN-O0-NEXT: v_writelane_b32 v1, s6, 2			; GCN-O0-NEXT: v_writelane_b32 v4, s6, 2
	; GCN-O0-NEXT: v_writelane_b32 v1, s7, 3			; GCN-O0-NEXT: v_writelane_b32 v4, s7, 3
	; GCN-O0-NEXT: s_mov_b64 s[6:7], s[4:5]			; GCN-O0-NEXT: s_mov_b64 s[6:7], s[4:5]
	; GCN-O0-NEXT: v_writelane_b32 v1, s6, 18			; GCN-O0-NEXT: v_writelane_b32 v4, s6, 18
	; GCN-O0-NEXT: v_writelane_b32 v1, s7, 19			; GCN-O0-NEXT: v_writelane_b32 v4, s7, 19
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
	; GCN-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill			; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
				; GCN-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
				; GCN-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
	; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[4:5]			; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[4:5]
	; GCN-O0-NEXT: s_cbranch_execnz .LBB5_1			; GCN-O0-NEXT: s_cbranch_execnz .LBB5_1
	; GCN-O0-NEXT: ; %bb.10: ; %bb12			; GCN-O0-NEXT: ; %bb.10: ; %bb12
	; GCN-O0-NEXT: v_readlane_b32 s4, v1, 18			; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GCN-O0-NEXT: v_readlane_b32 s5, v1, 19			; GCN-O0-NEXT: s_waitcnt expcnt(3)
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s4, v0, 18
				; GCN-O0-NEXT: v_readlane_b32 s5, v0, 19
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-O0-NEXT: ; %bb.11: ; %bb12			; GCN-O0-NEXT: ; %bb.11: ; %bb12
				; GCN-O0-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[14:15]
	; GCN-O0-NEXT: s_waitcnt expcnt(0)			; GCN-O0-NEXT: s_waitcnt expcnt(0)
	; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
	; GCN-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v0, v5			; GCN-O0-NEXT: v_mov_b32_e32 v5, v4
	; GCN-O0-NEXT: ; implicit-def: $sgpr4			; GCN-O0-NEXT: ; implicit-def: $sgpr4
	; GCN-O0-NEXT: v_mov_b32_e32 v6, s4			; GCN-O0-NEXT: v_mov_b32_e32 v6, s4
	; GCN-O0-NEXT: buffer_store_dword v0, v6, s[0:3], 0 offen			; GCN-O0-NEXT: buffer_store_dword v5, v6, s[0:3], 0 offen
	; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v0, v4			; GCN-O0-NEXT: v_mov_b32_e32 v5, v3
	; GCN-O0-NEXT: ; implicit-def: $sgpr4			; GCN-O0-NEXT: ; implicit-def: $sgpr4
	; GCN-O0-NEXT: v_mov_b32_e32 v6, s4			; GCN-O0-NEXT: v_mov_b32_e32 v6, s4
	; GCN-O0-NEXT: buffer_store_dword v0, v6, s[0:3], 0 offen			; GCN-O0-NEXT: buffer_store_dword v5, v6, s[0:3], 0 offen
	; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v0, v3			; GCN-O0-NEXT: v_mov_b32_e32 v5, v2
	; GCN-O0-NEXT: ; implicit-def: $sgpr4			; GCN-O0-NEXT: ; implicit-def: $sgpr4
	; GCN-O0-NEXT: v_mov_b32_e32 v6, s4			; GCN-O0-NEXT: v_mov_b32_e32 v6, s4
	; GCN-O0-NEXT: buffer_store_dword v0, v6, s[0:3], 0 offen			; GCN-O0-NEXT: buffer_store_dword v5, v6, s[0:3], 0 offen
	; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: v_mov_b32_e32 v0, v2			; GCN-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 killed $vgpr1_vgpr2_vgpr3_vgpr4 killed $exec
	; GCN-O0-NEXT: ; implicit-def: $sgpr4			; GCN-O0-NEXT: ; implicit-def: $sgpr4
	; GCN-O0-NEXT: v_mov_b32_e32 v2, s4			; GCN-O0-NEXT: v_mov_b32_e32 v2, s4
	; GCN-O0-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GCN-O0-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: ; kill: killed $vgpr0
	; GCN-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
				; GCN-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]			; GCN-O0-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-O0-NEXT: s_setpc_b64 s[30:31]			; GCN-O0-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %Flow1, %bb1, %bb			bb1: ; preds = %Flow1, %bb1, %bb
	%tmp = icmp slt i32 %arg, 519			%tmp = icmp slt i32 %arg, 519
	Show All 37 Lines

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn -amdgpu-spill-sgpr-to-vgpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VMEM -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn -amdgpu-spill-sgpr-to-vgpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VMEM -check-prefix=GCN %s
	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn -amdgpu-spill-sgpr-to-vgpr=1 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VGPR -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn -amdgpu-spill-sgpr-to-vgpr=1 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VGPR -check-prefix=GCN %s

	; Verify registers used for tracking exec mask changes when all			; Verify registers used for tracking exec mask changes when all
	; registers are spilled at the end of the block. The SGPR spill			; registers are spilled at the end of the block. The SGPR spill
	; placement relative to the exec modifications are important.			; placement relative to the exec modifications are important.

	; FIXME: This checks with SGPR to VGPR spilling disabled, but this may			; FIXME: This checks with SGPR to VGPR spilling disabled, but this may
	; not work correctly in cases where no workitems take a branch.			; not work correctly in cases where no workitems take a branch.


	; GCN-LABEL: {{^}}divergent_if_endif:			; GCN-LABEL: {{^}}divergent_if_endif:
	; VGPR: workitem_private_segment_byte_size = 12{{$}}			; VGPR: workitem_private_segment_byte_size = 16{{$}}


	; GCN: {{^}}; %bb.0:			; GCN: {{^}}; %bb.0:
	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v0, s{{[0-9]+}}			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v{{[0-9]+}}, s{{[0-9]+}}

	; Spill saved exec			; Spill saved exec
	; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec			; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]

	; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0			; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0
	; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1			; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines

	endif:			endif:
	%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]			%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]
	store i32 %tmp4, ptr addrspace(1) %out			store i32 %tmp4, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}divergent_loop:			; GCN-LABEL: {{^}}divergent_loop:
	; VGPR: workitem_private_segment_byte_size = 16{{$}}			; VGPR: workitem_private_segment_byte_size = 20{{$}}

	; GCN: {{^}}; %bb.0:			; GCN: {{^}}; %bb.0:
	; GCN-DAG: s_mov_b32 m0, -1			; GCN-DAG: s_mov_b32 m0, -1
	; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}			; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]
	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], v0, s{{[0-9]+}}			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], v{{[0-9]+}}, s{{[0-9]+}}

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec			; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; GCN-DAG: s_mov_b32 m0, -1			; GCN-DAG: s_mov_b32 m0, -1
	; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}			; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0			; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0
	; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v0, [[ZERO]]			; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v{{[0-9]+}}, [[ZERO]]

	; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec			; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec
	; GCN: s_and_b64 s[[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]], s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], [[CMP0]]			; GCN: s_and_b64 s[[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]], s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], [[CMP0]]
	; GCN: s_xor_b64 s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]], s[[[ANDEXEC_LO]]:[[ANDEXEC_HI]]], s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]]			; GCN: s_xor_b64 s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]], s[[[ANDEXEC_LO]]:[[ANDEXEC_HI]]], s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]]

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]
				; VGPR: buffer_store_dword [[SPILL_VGPR]], off, s[0:3], 0 offset:[[VREG_SAVE_RESTORE_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0			; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0
	; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1			; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1
	; VMEM: buffer_store_dword v[[V_SAVEEXEC]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, [[CMP0]]			; GCN: s_mov_b64 exec, [[CMP0]]

	; FIXME: It makes no sense to put this skip here			; FIXME: It makes no sense to put this skip here
	; GCN: s_cbranch_execz [[FLOW:.LBB[0-9]+_[0-9]+]]			; GCN: s_cbranch_execz [[FLOW:.LBB[0-9]+_[0-9]+]]
	; GCN-NEXT: s_branch [[ELSE:.LBB[0-9]+_[0-9]+]]			; GCN-NEXT: s_branch [[ELSE:.LBB[0-9]+_[0-9]+]]

	; GCN: [[FLOW]]: ; %Flow			; GCN: [[FLOW]]: ; %Flow
				; VGPR: buffer_load_dword [[SPILL_VGPR:v[0-9]+]], off, s[0:3], 0 offset:[[VREG_SAVE_RESTORE_OFFSET]] ; 4-byte Folded Reload
	; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]

	; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET]]			; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET]]
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 0			; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 0
	; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 1			; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 1
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 25 Lines
	define float @call_split_type_used_outside_block_v2f32() #0 {			define float @call_split_type_used_outside_block_v2f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v2f32:			; GCN-LABEL: call_split_type_used_outside_block_v2f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: v_writelane_b32 v40, s16, 2
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v41, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <2 x float> @func_v2f32()			%split.ret.type = call <2 x float> @func_v2f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <2 x float> %split.ret.type, i32 0			%extract = extractelement <2 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define float @call_split_type_used_outside_block_v3f32() #0 {			define float @call_split_type_used_outside_block_v3f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v3f32:			; GCN-LABEL: call_split_type_used_outside_block_v3f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: v_writelane_b32 v40, s16, 2
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v41, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <3 x float> @func_v3f32()			%split.ret.type = call <3 x float> @func_v3f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <3 x float> %split.ret.type, i32 0			%extract = extractelement <3 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define half @call_split_type_used_outside_block_v4f16() #0 {			define half @call_split_type_used_outside_block_v4f16() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v4f16:			; GCN-LABEL: call_split_type_used_outside_block_v4f16:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: v_writelane_b32 v40, s16, 2
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v41, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <4 x half> @func_v4f16()			%split.ret.type = call <4 x half> @func_v4f16()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <4 x half> %split.ret.type, i32 0			%extract = extractelement <4 x half> %split.ret.type, i32 0
	ret half %extract			ret half %extract
	}			}

	define { i32, half } @call_split_type_used_outside_block_struct() #0 {			define { i32, half } @call_split_type_used_outside_block_struct() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_struct:			; GCN-LABEL: call_split_type_used_outside_block_struct:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: v_writelane_b32 v40, s16, 2
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_mov_b32_e32 v1, v4			; GCN-NEXT: v_mov_b32_e32 v1, v4
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v41, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()			%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()
	br label %bb1			br label %bb1
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

	Show All 14 Lines
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s16, s33			; CHECK-NEXT: s_mov_b32 s16, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1			; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[18:19]			; CHECK-NEXT: s_mov_b64 exec, s[18:19]
				; CHECK-NEXT: v_writelane_b32 v40, s16, 16
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: v_writelane_b32 v40, s34, 2			; CHECK-NEXT: v_writelane_b32 v40, s34, 2
	; CHECK-NEXT: v_writelane_b32 v40, s35, 3			; CHECK-NEXT: v_writelane_b32 v40, s35, 3
	; CHECK-NEXT: v_writelane_b32 v40, s36, 4			; CHECK-NEXT: v_writelane_b32 v40, s36, 4
	; CHECK-NEXT: v_writelane_b32 v40, s37, 5			; CHECK-NEXT: v_writelane_b32 v40, s37, 5
	; CHECK-NEXT: v_writelane_b32 v40, s38, 6			; CHECK-NEXT: v_writelane_b32 v40, s38, 6
	; CHECK-NEXT: v_writelane_b32 v40, s39, 7			; CHECK-NEXT: v_writelane_b32 v40, s39, 7
	Show All 10 Lines
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9			; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, __kmpc_alloc_shared@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, __kmpc_alloc_shared@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, __kmpc_alloc_shared@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, __kmpc_alloc_shared@gotpcrel32@hi+12
	; CHECK-NEXT: v_writelane_b32 v40, s47, 15			; CHECK-NEXT: v_writelane_b32 v40, s47, 15
	; CHECK-NEXT: s_load_dwordx2 s[46:47], s[4:5], 0x0			; CHECK-NEXT: s_load_dwordx2 s[46:47], s[4:5], 0x0
	; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]			; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]
	; CHECK-NEXT: v_writelane_b32 v42, s16, 0
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v41, v31			; CHECK-NEXT: v_mov_b32_e32 v41, v31
	; CHECK-NEXT: s_mov_b32 s42, s15			; CHECK-NEXT: s_mov_b32 s42, s15
	; CHECK-NEXT: s_mov_b32 s43, s14			; CHECK-NEXT: s_mov_b32 s43, s14
	; CHECK-NEXT: s_mov_b32 s44, s13			; CHECK-NEXT: s_mov_b32 s44, s13
	; CHECK-NEXT: s_mov_b32 s45, s12			; CHECK-NEXT: s_mov_b32 s45, s12
	; CHECK-NEXT: s_mov_b64 s[34:35], s[10:11]			; CHECK-NEXT: s_mov_b64 s[34:35], s[10:11]
	; CHECK-NEXT: s_mov_b64 s[36:37], s[8:9]			; CHECK-NEXT: s_mov_b64 s[36:37], s[8:9]
	Show All 27 Lines
	; CHECK-NEXT: v_readlane_b32 s39, v40, 7			; CHECK-NEXT: v_readlane_b32 s39, v40, 7
	; CHECK-NEXT: v_readlane_b32 s38, v40, 6			; CHECK-NEXT: v_readlane_b32 s38, v40, 6
	; CHECK-NEXT: v_readlane_b32 s37, v40, 5			; CHECK-NEXT: v_readlane_b32 s37, v40, 5
	; CHECK-NEXT: v_readlane_b32 s36, v40, 4			; CHECK-NEXT: v_readlane_b32 s36, v40, 4
	; CHECK-NEXT: v_readlane_b32 s35, v40, 3			; CHECK-NEXT: v_readlane_b32 s35, v40, 3
	; CHECK-NEXT: v_readlane_b32 s34, v40, 2			; CHECK-NEXT: v_readlane_b32 s34, v40, 2
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: v_readlane_b32 s4, v42, 0			; CHECK-NEXT: v_readlane_b32 s4, v40, 16
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	%2 = call ptr @__kmpc_alloc_shared(), !dbg !43			%2 = call ptr @__kmpc_alloc_shared(), !dbg !43
	%3 = call ptr @__kmpc_alloc_shared()			%3 = call ptr @__kmpc_alloc_shared()
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/extend-wwm-virt-reg-liveness.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -start-before=si-lower-sgpr-spills -stop-after=virtregrewriter,1 -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				# Tests to check the conservative lieness extension for the wwm registers during SGPR spill lowering.

				# Even though the VGPR can be shared for the wwm-operand (writelane/readlane get inserted for the SGPR spills)
				# and the regular operand (%0), they get different registers as we conservatively extend the liveness of the
				# wwm-operands.
				---
				name: test_single_block
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr4, $vgpr2_vgpr3
				; GCN-LABEL: name: test_single_block
				; GCN: liveins: $sgpr4, $vgpr2_vgpr3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, killed $vgpr0
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 20, implicit $exec
				; GCN-NEXT: GLOBAL_STORE_DWORD $vgpr2_vgpr3, killed renamable $vgpr1, 0, 0, implicit $exec
				; GCN-NEXT: KILL killed renamable $vgpr0
				; GCN-NEXT: SI_RETURN
				SI_SPILL_S32_SAVE killed $sgpr4, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_NOP 0
				renamable $sgpr4 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				%0:vgpr_32 = V_MOV_B32_e32 20, implicit $exec
				GLOBAL_STORE_DWORD $vgpr2_vgpr3, %0:vgpr_32, 0, 0, implicit $exec
				SI_RETURN
				...

				# Due to the presence of wwm-operand in the divergent flow, the regular variable (%0) shouldn't get the same register
				# allocated for the wwm-operand in writelane/readlane when the SGPR spill is lowered.

				---
				name: test_if_else
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: test_if_else
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.1(0x80000000)
				; GCN-NEXT: liveins: $sgpr6, $sgpr10_sgpr11
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; GCN-NEXT: S_BRANCH %bb.1
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: successors: %bb.3(0x40000000), %bb.2(0x40000000)
				; GCN-NEXT: liveins: $sgpr6, $vgpr0, $sgpr10_sgpr11
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 10, implicit $exec
				; GCN-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr6, $vgpr0, $sgpr10_sgpr11
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr6, 0, killed $vgpr0
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 0
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 20, implicit $exec
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: liveins: $vgpr0, $vgpr1, $sgpr10_sgpr11
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr5 = V_READFIRSTLANE_B32 killed $vgpr1, implicit $exec
				; GCN-NEXT: S_STORE_DWORD_IMM $sgpr5, $sgpr10_sgpr11, 0, 0
				; GCN-NEXT: KILL killed renamable $vgpr0
				; GCN-NEXT: SI_RETURN
				bb.0:
				liveins: $sgpr6, $sgpr10_sgpr11
				S_BRANCH %bb.1
				bb.1:
				liveins: $sgpr6, $sgpr10_sgpr11
				%0:vgpr_32 = V_MOV_B32_e32 10, implicit $exec
				S_CBRANCH_EXECZ %bb.3, implicit $exec
				bb.2:
				liveins: $sgpr6, $sgpr10_sgpr11
				SI_SPILL_S32_SAVE killed $sgpr6, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_NOP 0
				renamable $sgpr6 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				%0:vgpr_32 = V_MOV_B32_e32 20, implicit $exec
				S_BRANCH %bb.3
				bb.3:
				liveins: $sgpr10_sgpr11
				$sgpr5 = V_READFIRSTLANE_B32 %0:vgpr_32, implicit $exec
				S_STORE_DWORD_IMM $sgpr5, $sgpr10_sgpr11, 0, 0
				SI_RETURN
				...

				# The wwm-register usage outside the loop should have the interference marked with
				# all the regular virtual registers used in the test. The divergent loop index value (%1)
				# can actually share the same VGPR as the wwm-operand. But since we extend the liveness of
				# the wwm operand, an interference will always exist between them.

				---
				name: test_loop
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: test_loop
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr4, $sgpr10_sgpr11
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 10, implicit $exec
				; GCN-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: successors: %bb.2(0x80000000)
				; GCN-NEXT: liveins: $sgpr4, $vgpr0, $sgpr10_sgpr11
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, killed $vgpr0
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 20, implicit $exec
				; GCN-NEXT: S_BRANCH %bb.2
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr4, $vgpr0, $vgpr1, $sgpr10_sgpr11
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_STORE_DWORD_IMM $sgpr4, $sgpr10_sgpr11, 0, 0
				; GCN-NEXT: $sgpr5 = V_READFIRSTLANE_B32 killed $vgpr1, implicit $exec
				; GCN-NEXT: S_STORE_DWORD_IMM $sgpr5, $sgpr10_sgpr11, 0, 4
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 5, implicit $exec
				; GCN-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: successors: %bb.5(0x40000000), %bb.4(0x40000000)
				; GCN-NEXT: liveins: $vgpr0, $vgpr1
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $vcc = V_CMP_EQ_U32_e64 0, $vgpr1, implicit $exec
				; GCN-NEXT: $sgpr6_sgpr7 = S_AND_SAVEEXEC_B64 $vcc, implicit-def $exec, implicit-def $scc, implicit $exec
				; GCN-NEXT: S_CBRANCH_SCC1 %bb.5, implicit $scc
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.4:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $vgpr0, $vgpr1, $sgpr6_sgpr7
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr1 = V_SUB_U32_e32 1, killed $vgpr1, implicit $exec
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.5:
				; GCN-NEXT: liveins: $vgpr0, $sgpr6_sgpr7
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $exec = S_OR_B64 $exec, $sgpr6_sgpr7, implicit-def $scc
				; GCN-NEXT: KILL killed renamable $vgpr0
				; GCN-NEXT: SI_RETURN
				bb.0:
				liveins: $sgpr4, $sgpr10_sgpr11
				%0:vgpr_32 = V_MOV_B32_e32 10, implicit $exec
				S_CBRANCH_EXECZ %bb.2, implicit $exec
				bb.1:
				liveins: $sgpr4, $sgpr10_sgpr11
				SI_SPILL_S32_SAVE killed $sgpr4, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_NOP 0
				renamable $sgpr4 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				%0:vgpr_32 = V_MOV_B32_e32 20, implicit $exec
				S_BRANCH %bb.2
				bb.2:
				liveins: $sgpr4, $sgpr10_sgpr11
				S_STORE_DWORD_IMM $sgpr4, $sgpr10_sgpr11, 0, 0
				$sgpr5 = V_READFIRSTLANE_B32 %0:vgpr_32, implicit $exec
				S_STORE_DWORD_IMM $sgpr5, $sgpr10_sgpr11, 0, 4
				%1:vgpr_32 = V_MOV_B32_e32 5, implicit $exec
				S_CBRANCH_EXECZ %bb.3, implicit $exec
				S_BRANCH %bb.3
				bb.3:
				$vcc = V_CMP_EQ_U32_e64 0, %1:vgpr_32, implicit $exec
				$sgpr6_sgpr7 = S_AND_SAVEEXEC_B64 $vcc, implicit-def $exec, implicit-def $scc, implicit $exec
				S_CBRANCH_SCC1 %bb.5, implicit $scc
				bb.4:
				liveins: $sgpr6_sgpr7
				%2:vgpr_32 = V_SUB_U32_e32 1, %1:vgpr_32, implicit $exec
				%1:vgpr_32 = V_MOV_B32_e32 %2:vgpr_32, implicit $exec
				S_BRANCH %bb.3
				bb.5:
				liveins: $sgpr6_sgpr7
				$exec = S_OR_B64 $exec, $sgpr6_sgpr7, implicit-def $scc
				SI_RETURN
				...

				# There must be one KILL instruction for the wwm-operand in every return block.
				# Due to that, the wwm-register allocated should be different from the ones
				# allocated for the regular virtual registers.

				---
				name: test_multiple_return_blocks
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: test_multiple_return_blocks
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr4, $vgpr2_vgpr3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; GCN-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: liveins: $sgpr4, $vgpr0, $vgpr2_vgpr3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, killed $vgpr0
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 10, implicit $exec
				; GCN-NEXT: GLOBAL_STORE_DWORD $vgpr2_vgpr3, killed renamable $vgpr1, 0, 0, implicit $exec
				; GCN-NEXT: KILL killed renamable $vgpr0
				; GCN-NEXT: SI_RETURN
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: liveins: $vgpr0, $vgpr2_vgpr3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: renamable $vgpr1 = V_MOV_B32_e32 20, implicit $exec
				; GCN-NEXT: GLOBAL_STORE_DWORD $vgpr2_vgpr3, killed renamable $vgpr1, 0, 0, implicit $exec
				; GCN-NEXT: KILL killed renamable $vgpr0
				; GCN-NEXT: SI_RETURN
				bb.0:
				liveins: $sgpr4, $vgpr2_vgpr3
				S_CBRANCH_EXECZ %bb.2, implicit $exec
				bb.1:
				liveins: $sgpr4, $vgpr2_vgpr3
				SI_SPILL_S32_SAVE killed $sgpr4, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_NOP 0
				renamable $sgpr4 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				%0:vgpr_32 = V_MOV_B32_e32 10, implicit $exec
				GLOBAL_STORE_DWORD $vgpr2_vgpr3, %0:vgpr_32, 0, 0, implicit $exec
				SI_RETURN
				bb.2:
				liveins: $vgpr2_vgpr3
				%1:vgpr_32 = V_MOV_B32_e32 20, implicit $exec
				GLOBAL_STORE_DWORD $vgpr2_vgpr3, %1:vgpr_32, 0, 0, implicit $exec
				SI_RETURN
				...

llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

	Show All 11 Lines
	; GCN-LABEL: test_stack_realign:			; GCN-LABEL: test_stack_realign:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_add_i32 s33, s32, 0xfc0			; GCN-NEXT: s_add_i32 s33, s32, 0xfc0
	; GCN-NEXT: s_and_b32 s33, s33, 0xfffff000			; GCN-NEXT: s_and_b32 s33, s33, 0xfffff000
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:96 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:96 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:100 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
	; GCN-NEXT: s_addk_i32 s32, 0x3000			; GCN-NEXT: s_addk_i32 s32, 0x3000
	; GCN-NEXT: v_writelane_b32 v43, s16, 0			; GCN-NEXT: v_writelane_b32 v42, s16, 2
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, extern_func@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, extern_func@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, extern_func@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, extern_func@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v42, s30, 0			; GCN-NEXT: v_writelane_b32 v42, s30, 0
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:92			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:92
	Show All 19 Lines
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s31, v42, 1			; GCN-NEXT: v_readlane_b32 s31, v42, 1
	; GCN-NEXT: v_readlane_b32 s30, v42, 0			; GCN-NEXT: v_readlane_b32 s30, v42, 0
	; GCN-NEXT: v_readlane_b32 s4, v43, 0			; GCN-NEXT: v_readlane_b32 s4, v42, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:96 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:96 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:100 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xd000			; GCN-NEXT: s_addk_i32 s32, 0xd000
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%alloca.val = alloca <8 x i32>, align 64, addrspace(5)			%alloca.val = alloca <8 x i32>, align 64, addrspace(5)
	store volatile <8 x i32> %val, ptr addrspace(5) %alloca.val, align 64			store volatile <8 x i32> %val, ptr addrspace(5) %alloca.val, align 64
	call void asm sideeffect "", "~{v40}" ()			call void asm sideeffect "", "~{v40}" ()
	call void asm sideeffect "", "~{v41}" ()			call void asm sideeffect "", "~{v41}" ()
	call void @extern_func(i32 %idx)			call void @extern_func(i32 %idx)
	ret void			ret void
	}			}

	declare void @extern_func(i32) #0			declare void @extern_func(i32) #0

	attributes #0 = { noinline nounwind }			attributes #0 = { noinline nounwind }

llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @test(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @test(ptr addrspace(1) %out, i32 %in) {
	; FLAT_SCR_OPT-LABEL: test:			; FLAT_SCR_OPT-LABEL: test:
	; FLAT_SCR_OPT: ; %bb.0:			; FLAT_SCR_OPT: ; %bb.0:
	; FLAT_SCR_OPT-NEXT: s_add_u32 s2, s2, s5			; FLAT_SCR_OPT-NEXT: s_add_u32 s2, s2, s5
	; FLAT_SCR_OPT-NEXT: s_addc_u32 s3, s3, 0			; FLAT_SCR_OPT-NEXT: s_addc_u32 s3, s3, 0
	; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; FLAT_SCR_OPT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0			; FLAT_SCR_OPT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s4, exec_lo			; FLAT_SCR_OPT-NEXT: ; implicit-def: $vgpr0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, 3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s5, 0
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v0, s5
	; FLAT_SCR_OPT-NEXT: s_waitcnt lgkmcnt(0)			; FLAT_SCR_OPT-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s2, 0			; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s2, 0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s5, 4
	; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s3, 1			; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s3, 1
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v0, s5 ; 4-byte Folded Spill			; FLAT_SCR_OPT-NEXT: s_or_saveexec_b32 s105, -1
				; FLAT_SCR_OPT-NEXT: s_mov_b32 s2, 4
				; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v0, s2 ; 4-byte Folded Spill
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s5, 0			; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v0, off, s5
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s4
	; FLAT_SCR_OPT-NEXT: s_load_dword vcc_lo, s[0:1], 0x8			; FLAT_SCR_OPT-NEXT: s_load_dword vcc_lo, s[0:1], 0x8
	; FLAT_SCR_OPT-NEXT: ; kill: killed $sgpr0_sgpr1			; FLAT_SCR_OPT-NEXT: ; kill: killed $sgpr0_sgpr1
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s2, exec_lo			; FLAT_SCR_OPT-NEXT: s_or_saveexec_b32 s105, -1
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, 3			; FLAT_SCR_OPT-NEXT: s_mov_b32 s0, 4
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_OPT-NEXT: scratch_load_dword v1, off, s0 ; 4-byte Folded Reload
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v1, s3
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 4
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v1, off, s3 ; 4-byte Folded Reload
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: v_readlane_b32 s0, v1, 0			; FLAT_SCR_OPT-NEXT: v_readlane_b32 s0, v1, 0
	; FLAT_SCR_OPT-NEXT: v_readlane_b32 s1, v1, 1			; FLAT_SCR_OPT-NEXT: v_readlane_b32 s1, v1, 1
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v1, off, s3			; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v2, 0
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_OPT-NEXT: ; kill: killed $vgpr1
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_OPT-NEXT: global_store_dword v2, v0, s[0:1]
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s2
	; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v1, 0
	; FLAT_SCR_OPT-NEXT: global_store_dword v1, v0, s[0:1]
	; FLAT_SCR_OPT-NEXT: s_endpgm			; FLAT_SCR_OPT-NEXT: s_endpgm
	;			;
	; FLAT_SCR_ARCH-LABEL: test:			; FLAT_SCR_ARCH-LABEL: test:
	; FLAT_SCR_ARCH: ; %bb.0:			; FLAT_SCR_ARCH: ; %bb.0:
	; FLAT_SCR_ARCH-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0			; FLAT_SCR_ARCH-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s4, exec_lo			; FLAT_SCR_ARCH-NEXT: ; implicit-def: $vgpr0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, 3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s5, 0
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v0, s5
	; FLAT_SCR_ARCH-NEXT: s_waitcnt lgkmcnt(0)			; FLAT_SCR_ARCH-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s2, 0			; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s2, 0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s5, 4
	; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s3, 1			; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s3, 1
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v0, s5 ; 4-byte Folded Spill			; FLAT_SCR_ARCH-NEXT: s_or_saveexec_b32 s105, -1
				; FLAT_SCR_ARCH-NEXT: s_mov_b32 s2, 4
				; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v0, s2 ; 4-byte Folded Spill
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s5, 0			; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v0, off, s5
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s4
	; FLAT_SCR_ARCH-NEXT: s_load_dword vcc_lo, s[0:1], 0x8			; FLAT_SCR_ARCH-NEXT: s_load_dword vcc_lo, s[0:1], 0x8
	; FLAT_SCR_ARCH-NEXT: ; kill: killed $sgpr0_sgpr1			; FLAT_SCR_ARCH-NEXT: ; kill: killed $sgpr0_sgpr1
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s2, exec_lo			; FLAT_SCR_ARCH-NEXT: s_or_saveexec_b32 s105, -1
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, 3			; FLAT_SCR_ARCH-NEXT: s_mov_b32 s0, 4
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_ARCH-NEXT: scratch_load_dword v1, off, s0 ; 4-byte Folded Reload
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v1, s3
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 4
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v1, off, s3 ; 4-byte Folded Reload
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s0, v1, 0			; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s0, v1, 0
	; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s1, v1, 1			; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s1, v1, 1
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v1, off, s3			; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v2, 0
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_ARCH-NEXT: ; kill: killed $vgpr1
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_ARCH-NEXT: global_store_dword v2, v0, s[0:1]
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s2
	; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v1, 0
	; FLAT_SCR_ARCH-NEXT: global_store_dword v1, v0, s[0:1]
	; FLAT_SCR_ARCH-NEXT: s_endpgm			; FLAT_SCR_ARCH-NEXT: s_endpgm
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	call void asm sideeffect "", "~{s[8:15]}" ()			call void asm sideeffect "", "~{s[8:15]}" ()
	call void asm sideeffect "", "~{s[16:23]}" ()			call void asm sideeffect "", "~{s[16:23]}" ()
	call void asm sideeffect "", "~{s[24:31]}" ()			call void asm sideeffect "", "~{s[24:31]}" ()
	call void asm sideeffect "", "~{s[32:39]}" ()			call void asm sideeffect "", "~{s[32:39]}" ()
	call void asm sideeffect "", "~{s[40:47]}" ()			call void asm sideeffect "", "~{s[40:47]}" ()
	call void asm sideeffect "", "~{s[48:55]}" ()			call void asm sideeffect "", "~{s[48:55]}" ()
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fold-reload-into-exec.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s

	# Test that a spill of a copy of exec is not folded to be a spill of exec directly.			# Test that a spill of a copy of exec is not folded to be a spill of exec directly.

	---			---

	name: merge_sgpr_spill_into_copy_from_exec_lo			name: merge_sgpr_spill_into_copy_from_exec_lo
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_lo			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_lo
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec_lo			; CHECK-NEXT: S_NOP 0, implicit-def $exec_lo
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_lo			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_lo
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec_lo			S_NOP 0, implicit-def $exec_lo
	%0:sreg_32 = COPY $exec_lo			%0:sreg_32 = COPY $exec_lo
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$exec_lo = COPY %0			$exec_lo = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: merge_sgpr_spill_into_copy_from_exec_hi			name: merge_sgpr_spill_into_copy_from_exec_hi
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_hi			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_hi
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec_hi			; CHECK-NEXT: S_NOP 0, implicit-def $exec_hi
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_hi			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_hi
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec_hi			S_NOP 0, implicit-def $exec_hi
	%0:sreg_32 = COPY $exec_hi			%0:sreg_32 = COPY $exec_hi
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$exec_hi = COPY %0			$exec_hi = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: merge_sgpr_spill_into_copy_from_exec			name: merge_sgpr_spill_into_copy_from_exec
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec			; CHECK-NEXT: S_NOP 0, implicit-def $exec
	; CHECK-NEXT: $sgpr0_sgpr1 = S_MOV_B64 $exec			; CHECK-NEXT: $sgpr0_sgpr1 = S_MOV_B64 $exec
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1, implicit killed renamable $sgpr0_sgpr1			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1, implicit killed renamable $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 killed $vgpr0, 1
	; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec			S_NOP 0, implicit-def $exec
	%0:sreg_64 = COPY $exec			%0:sreg_64 = COPY $exec
	S_NOP 0, implicit-def %1:sreg_64, implicit-def %2:sreg_64, implicit %0			S_NOP 0, implicit-def %1:sreg_64, implicit-def %2:sreg_64, implicit %0
	$exec = COPY %0			$exec = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

	# Test that a reload into a copy of exec is not folded to be a reload of exec directly.			# Test that a reload into a copy of exec is not folded to be a reload of exec directly.

	---			---

	name: reload_sgpr_spill_into_copy_to_exec_lo			name: reload_sgpr_spill_into_copy_to_exec_lo
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_lo			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_lo
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_lo			; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_lo
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_lo			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_lo
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$exec_lo = COPY %0			$exec_lo = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: reload_sgpr_spill_into_copy_to_exec_hi			name: reload_sgpr_spill_into_copy_to_exec_hi
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_hi			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_hi
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_hi			; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_hi
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_hi			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_hi
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$exec_hi = COPY %0			$exec_hi = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: reload_sgpr_spill_into_copy_to_exec			name: reload_sgpr_spill_into_copy_to_exec
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def $exec			; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def $exec
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 killed $vgpr0, 1
	; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_64, implicit-def %1:sreg_64, implicit-def $exec			S_NOP 0, implicit-def %0:sreg_64, implicit-def %1:sreg_64, implicit-def $exec
	S_NOP 0, implicit %0, implicit-def %3:sreg_64, implicit-def %4:sreg_64			S_NOP 0, implicit %0, implicit-def %3:sreg_64, implicit-def %4:sreg_64
	$exec = COPY %0			$exec = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

llvm/test/CodeGen/AMDGPU/fold-reload-into-m0.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s

	# Test that a spill of a copy of m0 is not folded to be a spill of m0 directly.			# Test that a spill of a copy of m0 is not folded to be a spill of m0 directly.

	---			---

	name: merge_sgpr_spill_into_copy_from_m0			name: merge_sgpr_spill_into_copy_from_m0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:

	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_m0			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_m0
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $m0			; CHECK-NEXT: S_NOP 0, implicit-def $m0
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $m0			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $m0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_NOP 0			; CHECK-NEXT: S_NOP 0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $m0			S_NOP 0, implicit-def $m0
	%0:sreg_32 = COPY $m0			%0:sreg_32 = COPY $m0
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$m0 = COPY %0			$m0 = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

	# Test that a reload into a copy of m0 is not folded to be a reload of m0 directly.			# Test that a reload into a copy of m0 is not folded to be a reload of m0 directly.

	---			---

	name: reload_sgpr_spill_into_copy_to_m0			name: reload_sgpr_spill_into_copy_to_m0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:

	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_m0			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_m0
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $m0			; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $m0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_NOP 0			; CHECK-NEXT: S_NOP 0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $m0			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $m0
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$m0 = COPY %0			$m0 = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s

	; Check frame setup where SGPR spills to VGPRs are disabled or enabled.			; Check frame setup where SGPR spills to VGPRs are disabled or enabled.

	declare hidden void @external_void_func_void() #0			declare hidden void @external_void_func_void() #0

	define void @callee_with_stack_and_call() #0 {			define void @callee_with_stack_and_call() #0 {
	; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; SPILL-TO-VGPR: ; %bb.0:			; SPILL-TO-VGPR: ; %bb.0:
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s4, s33			; SPILL-TO-VGPR-NEXT: s_mov_b32 s4, s33
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[8:9], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[8:9], -1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[8:9]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[8:9]
				; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s4, 2
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0
	; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0			; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v41, s4, 0
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33			; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]			; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]
	; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]			; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v41, 0			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v40, 2
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[6:7], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[6:7], -1
	; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; SPILL-TO-VGPR-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s4			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s4
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]			; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]
	;			;
	; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; NO-SPILL-TO-VGPR: ; %bb.0:			; NO-SPILL-TO-VGPR: ; %bb.0:
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s
	; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s			; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s

	declare void @extern_c_func()			declare void @extern_c_func()

	define amdgpu_gfx void @gfx_func() {			define amdgpu_gfx void @gfx_func() {
	; SDAG-LABEL: gfx_func:			; SDAG-LABEL: gfx_func:
	; SDAG: ; %bb.0:			; SDAG: ; %bb.0:
	; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SDAG-NEXT: s_mov_b32 s38, s33			; SDAG-NEXT: s_mov_b32 s36, s33
	; SDAG-NEXT: s_mov_b32 s33, s32			; SDAG-NEXT: s_mov_b32 s33, s32
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: v_writelane_b32 v40, s4, 0			; SDAG-NEXT: v_writelane_b32 v40, s4, 0
	; SDAG-NEXT: v_writelane_b32 v40, s5, 1			; SDAG-NEXT: v_writelane_b32 v40, s5, 1
	; SDAG-NEXT: v_writelane_b32 v40, s6, 2			; SDAG-NEXT: v_writelane_b32 v40, s6, 2
	; SDAG-NEXT: v_writelane_b32 v40, s7, 3			; SDAG-NEXT: v_writelane_b32 v40, s7, 3
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; SDAG-NEXT: v_readlane_b32 s7, v40, 3			; SDAG-NEXT: v_readlane_b32 s7, v40, 3
	; SDAG-NEXT: v_readlane_b32 s6, v40, 2			; SDAG-NEXT: v_readlane_b32 s6, v40, 2
	; SDAG-NEXT: v_readlane_b32 s5, v40, 1			; SDAG-NEXT: v_readlane_b32 s5, v40, 1
	; SDAG-NEXT: v_readlane_b32 s4, v40, 0			; SDAG-NEXT: v_readlane_b32 s4, v40, 0
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: s_addk_i32 s32, 0xfc00			; SDAG-NEXT: s_addk_i32 s32, 0xfc00
	; SDAG-NEXT: s_mov_b32 s33, s38			; SDAG-NEXT: s_mov_b32 s33, s36
	; SDAG-NEXT: s_waitcnt vmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0)
	; SDAG-NEXT: s_setpc_b64 s[30:31]			; SDAG-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: gfx_func:			; GISEL-LABEL: gfx_func:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s38, s33			; GISEL-NEXT: s_mov_b32 s36, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: v_writelane_b32 v40, s4, 0			; GISEL-NEXT: v_writelane_b32 v40, s4, 0
	; GISEL-NEXT: v_writelane_b32 v40, s5, 1			; GISEL-NEXT: v_writelane_b32 v40, s5, 1
	; GISEL-NEXT: v_writelane_b32 v40, s6, 2			; GISEL-NEXT: v_writelane_b32 v40, s6, 2
	; GISEL-NEXT: v_writelane_b32 v40, s7, 3			; GISEL-NEXT: v_writelane_b32 v40, s7, 3
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s7, v40, 3			; GISEL-NEXT: v_readlane_b32 s7, v40, 3
	; GISEL-NEXT: v_readlane_b32 s6, v40, 2			; GISEL-NEXT: v_readlane_b32 s6, v40, 2
	; GISEL-NEXT: v_readlane_b32 s5, v40, 1			; GISEL-NEXT: v_readlane_b32 s5, v40, 1
	; GISEL-NEXT: v_readlane_b32 s4, v40, 0			; GISEL-NEXT: v_readlane_b32 s4, v40, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: s_mov_b32 s33, s38			; GISEL-NEXT: s_mov_b32 s33, s36
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void @extern_c_func()			call void @extern_c_func()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_imm:			; GFX9-LABEL: test_call_external_void_func_i1_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm:			; GFX10-LABEL: test_call_external_void_func_i1_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_imm:			; GFX11-LABEL: test_call_external_void_func_i1_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 1			; GFX11-NEXT: v_mov_b32_e32 v0, 1
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1(i1 true)			call amdgpu_gfx void @external_void_func_i1(i1 true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_signext:			; GFX9-LABEL: test_call_external_void_func_i1_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_signext:			; GFX10-LABEL: test_call_external_void_func_i1_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_signext:			; GFX11-LABEL: test_call_external_void_func_i1_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: v_and_b32_e32 v0, 1, v0			; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, ptr addrspace(1) undef			%var = load volatile i1, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)			call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_zeroext:			; GFX9-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_zeroext:			; GFX11-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: v_and_b32_e32 v0, 1, v0			; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, ptr addrspace(1) undef			%var = load volatile i1, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)			call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm:			; GFX9-LABEL: test_call_external_void_func_i8_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm:			; GFX10-LABEL: test_call_external_void_func_i8_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_imm:			; GFX11-LABEL: test_call_external_void_func_i8_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8(i8 123)			call amdgpu_gfx void @external_void_func_i8(i8 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_signext:			; GFX9-LABEL: test_call_external_void_func_i8_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_signext:			; GFX10-LABEL: test_call_external_void_func_i8_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_signext:			; GFX11-LABEL: test_call_external_void_func_i8_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_i8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_i8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, ptr addrspace(1) undef			%var = load volatile i8, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)			call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_zeroext:			; GFX9-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_zeroext:			; GFX11-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, ptr addrspace(1) undef			%var = load volatile i8, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)			call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm:			; GFX9-LABEL: test_call_external_void_func_i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm:			; GFX10-LABEL: test_call_external_void_func_i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_imm:			; GFX11-LABEL: test_call_external_void_func_i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16(i16 123)			call amdgpu_gfx void @external_void_func_i16(i16 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_signext:			; GFX9-LABEL: test_call_external_void_func_i16_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_signext:			; GFX10-LABEL: test_call_external_void_func_i16_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_signext:			; GFX11-LABEL: test_call_external_void_func_i16_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, ptr addrspace(1) undef			%var = load volatile i16, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)			call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_zeroext:			; GFX9-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_zeroext:			; GFX11-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, ptr addrspace(1) undef			%var = load volatile i16, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)			call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm:			; GFX9-LABEL: test_call_external_void_func_i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm:			; GFX10-LABEL: test_call_external_void_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i32_imm:			; GFX11-LABEL: test_call_external_void_func_i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 42			; GFX11-NEXT: v_mov_b32_e32 v0, 42
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32(i32 42)			call amdgpu_gfx void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm:			; GFX9-LABEL: test_call_external_void_func_i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm:			; GFX10-LABEL: test_call_external_void_func_i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i64_imm:			; GFX11-LABEL: test_call_external_void_func_i64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 0x7b :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0x7b :: v_dual_mov_b32 v1, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64(i64 123)			call amdgpu_gfx void @external_void_func_i64(i64 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64:			; GFX9-LABEL: test_call_external_void_func_v2i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64:			; GFX10-LABEL: test_call_external_void_func_v2i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64:			; GFX11-LABEL: test_call_external_void_func_v2i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off			; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, ptr addrspace(1) null			%val = load <2 x i64>, ptr addrspace(1) null
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_imm:			; GFX11-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64:			; GFX9-LABEL: test_call_external_void_func_v3i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64:			; GFX10-LABEL: test_call_external_void_func_v3i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i64:			; GFX11-LABEL: test_call_external_void_func_v3i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
				; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(1) null			%load = load <2 x i64>, ptr addrspace(1) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)			call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i64:			; GFX9-LABEL: test_call_external_void_func_v4i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: v_mov_b32_e32 v6, 3			; GFX9-NEXT: v_mov_b32_e32 v6, 3
	; GFX9-NEXT: v_mov_b32_e32 v7, 4			; GFX9-NEXT: v_mov_b32_e32 v7, 4
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64:			; GFX10-LABEL: test_call_external_void_func_v4i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: v_mov_b32_e32 v6, 3			; GFX10-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v7, 4			; GFX10-NEXT: v_mov_b32_e32 v7, 4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i64:			; GFX11-LABEL: test_call_external_void_func_v4i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v6, 3 :: v_dual_mov_b32 v7, 4			; GFX11-NEXT: v_dual_mov_b32 v6, 3 :: v_dual_mov_b32 v7, 4
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off			; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(1) null			%load = load <2 x i64>, ptr addrspace(1) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)			call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm:			; GFX9-LABEL: test_call_external_void_func_f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm:			; GFX10-LABEL: test_call_external_void_func_f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f16_imm:			; GFX11-LABEL: test_call_external_void_func_f16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX11-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16(half 4.0)			call amdgpu_gfx void @external_void_func_f16(half 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm:			; GFX9-LABEL: test_call_external_void_func_f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 4.0			; GFX9-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm:			; GFX10-LABEL: test_call_external_void_func_f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f32_imm:			; GFX11-LABEL: test_call_external_void_func_f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 4.0			; GFX11-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32(float 4.0)			call amdgpu_gfx void @external_void_func_f32(float 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f32_imm:			; GFX11-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f32_imm:			; GFX11-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_mov_b32_e32 v2, 4.0			; GFX11-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: v_mov_b32_e32 v3, -1.0			; GFX9-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX9-NEXT: v_mov_b32_e32 v4, 0.5			; GFX9-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-NEXT: v_mov_b32_e32 v3, -1.0
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5f32_imm:			; GFX11-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_dual_mov_b32 v2, 4.0 :: v_dual_mov_b32 v3, -1.0			; GFX11-NEXT: v_dual_mov_b32 v2, 4.0 :: v_dual_mov_b32 v3, -1.0
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, 0.5			; GFX11-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm:			; GFX9-LABEL: test_call_external_void_func_f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm:			; GFX10-LABEL: test_call_external_void_func_f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f64_imm:			; GFX11-LABEL: test_call_external_void_func_f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x40100000			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x40100000
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64(double 4.0)			call amdgpu_gfx void @external_void_func_f64(double 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f64_imm:			; GFX11-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: v_mov_b32_e32 v4, 0			; GFX9-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0			; GFX10-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f64_imm:			; GFX11-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 0x40200000			; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 0x40200000
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16:			; GFX9-LABEL: test_call_external_void_func_v2i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16:			; GFX10-LABEL: test_call_external_void_func_v2i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i16:			; GFX11-LABEL: test_call_external_void_func_v2i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_b32 v0, v[0:1], off			; GFX11-NEXT: global_load_b32 v0, v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, ptr addrspace(1) undef			%val = load <2 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)			call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16:			; GFX9-LABEL: test_call_external_void_func_v3i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16:			; GFX10-LABEL: test_call_external_void_func_v3i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16:			; GFX11-LABEL: test_call_external_void_func_v3i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, ptr addrspace(1) undef			%val = load <3 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16:			; GFX9-LABEL: test_call_external_void_func_v3f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16:			; GFX10-LABEL: test_call_external_void_func_v3f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16:			; GFX11-LABEL: test_call_external_void_func_v3f16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, ptr addrspace(1) undef			%val = load <3 x half>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 3			; GFX9-NEXT: v_mov_b32_e32 v1, 3
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 3			; GFX10-NEXT: v_mov_b32_e32 v1, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_imm:			; GFX11-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 0x20001 :: v_dual_mov_b32 v1, 3			; GFX11-NEXT: v_dual_mov_b32 v0, 0x20001 :: v_dual_mov_b32 v1, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_imm:			; GFX11-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX11-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX11-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16:			; GFX9-LABEL: test_call_external_void_func_v4i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16:			; GFX10-LABEL: test_call_external_void_func_v4i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16:			; GFX11-LABEL: test_call_external_void_func_v4i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, ptr addrspace(1) undef			%val = load <4 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_imm:			; GFX11-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX11-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX11-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16:			; GFX9-LABEL: test_call_external_void_func_v2f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16:			; GFX10-LABEL: test_call_external_void_func_v2f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f16:			; GFX11-LABEL: test_call_external_void_func_v2f16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_b32 v0, v[0:1], off			; GFX11-NEXT: global_load_b32 v0, v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, ptr addrspace(1) undef			%val = load <2 x half>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)			call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32:			; GFX9-LABEL: test_call_external_void_func_v2i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32:			; GFX10-LABEL: test_call_external_void_func_v2i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32:			; GFX11-LABEL: test_call_external_void_func_v2i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, ptr addrspace(1) undef			%val = load <2 x i32>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_imm:			; GFX11-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_imm:			; GFX11-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4
	; GFX11-NEXT: v_mov_b32_e32 v2, 5			; GFX11-NEXT: v_mov_b32_e32 v2, 5
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: v_mov_b32_e32 v3, 6			; GFX9-NEXT: v_mov_b32_e32 v3, 6
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-NEXT: v_mov_b32_e32 v3, 6			; GFX10-NEXT: v_mov_b32_e32 v3, 6
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_i32:			; GFX11-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4
	; GFX11-NEXT: v_dual_mov_b32 v2, 5 :: v_dual_mov_b32 v3, 6			; GFX11-NEXT: v_dual_mov_b32 v2, 5 :: v_dual_mov_b32 v3, 6
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)			call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32:			; GFX9-LABEL: test_call_external_void_func_v4i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32:			; GFX10-LABEL: test_call_external_void_func_v4i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32:			; GFX11-LABEL: test_call_external_void_func_v4i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off			; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, ptr addrspace(1) undef			%val = load <4 x i32>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_imm:			; GFX11-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5i32_imm:			; GFX11-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, 5			; GFX11-NEXT: v_mov_b32_e32 v4, 5
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32:			; GFX9-LABEL: test_call_external_void_func_v8i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v8, 0			; GFX9-NEXT: v_mov_b32_e32 v8, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32:			; GFX10-LABEL: test_call_external_void_func_v8i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v8, 0			; GFX10-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32:			; GFX11-LABEL: test_call_external_void_func_v8i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, 0			; GFX11-NEXT: v_mov_b32_e32 v4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_b128 v[0:3], v4, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v4, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v4, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v4, s[0:1] offset:16
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <8 x i32>, ptr addrspace(1) %ptr			%val = load <8 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: v_mov_b32_e32 v5, 6			; GFX9-NEXT: v_mov_b32_e32 v5, 6
	; GFX9-NEXT: v_mov_b32_e32 v6, 7			; GFX9-NEXT: v_mov_b32_e32 v6, 7
	; GFX9-NEXT: v_mov_b32_e32 v7, 8			; GFX9-NEXT: v_mov_b32_e32 v7, 8
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-NEXT: v_mov_b32_e32 v5, 6			; GFX10-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-NEXT: v_mov_b32_e32 v6, 7			; GFX10-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-NEXT: v_mov_b32_e32 v7, 8			; GFX10-NEXT: v_mov_b32_e32 v7, 8
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_imm:			; GFX11-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 5 :: v_dual_mov_b32 v5, 6			; GFX11-NEXT: v_dual_mov_b32 v4, 5 :: v_dual_mov_b32 v5, 6
	; GFX11-NEXT: v_dual_mov_b32 v6, 7 :: v_dual_mov_b32 v7, 8			; GFX11-NEXT: v_dual_mov_b32 v6, 7 :: v_dual_mov_b32 v7, 8
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i32:			; GFX9-LABEL: test_call_external_void_func_v16i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v16, 0			; GFX9-NEXT: v_mov_b32_e32 v16, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32:			; GFX10-LABEL: test_call_external_void_func_v16i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v16, 0			; GFX10-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i32:			; GFX11-LABEL: test_call_external_void_func_v16i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v12, 0			; GFX11-NEXT: v_mov_b32_e32 v12, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: global_load_b128 v[0:3], v12, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v12, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v12, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v12, s[0:1] offset:16
	; GFX11-NEXT: global_load_b128 v[8:11], v12, s[0:1] offset:32			; GFX11-NEXT: global_load_b128 v[8:11], v12, s[0:1] offset:32
	; GFX11-NEXT: global_load_b128 v[12:15], v12, s[0:1] offset:48			; GFX11-NEXT: global_load_b128 v[12:15], v12, s[0:1] offset:48
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x3			; GFX10-SCRATCH-NEXT: s_clause 0x3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <16 x i32>, ptr addrspace(1) %ptr			%val = load <16 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)			call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32:			; GFX9-LABEL: test_call_external_void_func_v32i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v28, 0			; GFX9-NEXT: v_mov_b32_e32 v28, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32:			; GFX10-LABEL: test_call_external_void_func_v32i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32:			; GFX11-LABEL: test_call_external_void_func_v32i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v28, 0			; GFX11-NEXT: v_mov_b32_e32 v28, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x7			; GFX11-NEXT: s_clause 0x7
	; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16
	; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32			; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32
	; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48			; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48
	; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64			; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64
	; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80			; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80
	; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96			; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96
	; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112			; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <32 x i32>, ptr addrspace(1) %ptr			%val = load <32 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)			call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_i32:			; GFX9-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v28, 0			; GFX9-NEXT: v_mov_b32_e32 v28, 0
	; GFX9-NEXT: global_load_dword v32, v[0:1], off			; GFX9-NEXT: global_load_dword v32, v[0:1], off
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12
	; GFX9-NEXT: s_waitcnt vmcnt(8)			; GFX9-NEXT: s_waitcnt vmcnt(8)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: global_load_dword v33, v[0:1], off			; GFX10-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt vmcnt(8)			; GFX10-NEXT: s_waitcnt vmcnt(8)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_i32:			; GFX11-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v28, 0			; GFX11-NEXT: v_mov_b32_e32 v28, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: global_load_b32 v32, v[0:1], off			; GFX11-NEXT: global_load_b32 v32, v[0:1], off
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x7			; GFX11-NEXT: s_clause 0x7
	; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16
	; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32			; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32
	; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48			; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48
	; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64			; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64
	; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80			; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80
	; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96			; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96
	; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112			; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt vmcnt(8)			; GFX11-NEXT: s_waitcnt vmcnt(8)
	; GFX11-NEXT: scratch_store_b32 off, v32, s32			; GFX11-NEXT: scratch_store_b32 off, v32, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef
	%val0 = load <32 x i32>, ptr addrspace(1) %ptr0			%val0 = load <32 x i32>, ptr addrspace(1) %ptr0
	%val1 = load i32, ptr addrspace(1) undef			%val1 = load i32, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_i32_func_i32_imm(ptr addrspace(1) %out) #0 {			define amdgpu_gfx void @test_call_external_i32_func_i32_imm(ptr addrspace(1) %out) #0 {
	; GFX9-LABEL: test_call_external_i32_func_i32_imm:			; GFX9-LABEL: test_call_external_i32_func_i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
				; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v41, v0			; GFX9-NEXT: v_mov_b32_e32 v41, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: v_writelane_b32 v43, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_mov_b32_e32 v42, v1			; GFX9-NEXT: v_mov_b32_e32 v42, v1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: global_store_dword v[41:42], v0, off			; GFX9-NEXT: global_store_dword v[41:42], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v43, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
				; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v41, v0			; GFX10-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v43, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_mov_b32_e32 v42, v1			; GFX10-NEXT: v_mov_b32_e32 v42, v1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: global_store_dword v[41:42], v0, off			; GFX10-NEXT: global_store_dword v[41:42], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v43, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_i32_func_i32_imm:			; GFX11-LABEL: test_call_external_i32_func_i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v42, s33			; GFX11-NEXT: scratch_store_b32 off, v42, s33
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v42, v1 :: v_dual_mov_b32 v41, v0			; GFX11-NEXT: v_dual_mov_b32 v42, v1 :: v_dual_mov_b32 v41, v0
	; GFX11-NEXT: v_mov_b32_e32 v0, 42			; GFX11-NEXT: v_mov_b32_e32 v0, 42
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v43, s0, 0			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: global_store_b32 v[41:42], v0, off dlc			; GFX11-NEXT: global_store_b32 v[41:42], v0, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_b32 v42, off, s33			; GFX11-NEXT: scratch_load_b32 v42, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v43, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v43, s33 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v43, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off			; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v43, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8
	; GFX10-SCRATCH-NEXT: scratch_load_dword v43, off, s33 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)			%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
	store volatile i32 %val, ptr addrspace(1) %out			store volatile i32 %val, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]			; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]
	; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4			; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]			; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]
	; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4			; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_u8 v0, v1, s[0:1]			; GFX11-NEXT: global_load_u8 v0, v1, s[0:1]
	; GFX11-NEXT: global_load_b32 v1, v1, s[0:1] offset:4			; GFX11-NEXT: global_load_b32 v1, v1, s[0:1] offset:4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4			; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load { i8, i32 }, ptr addrspace(1) %ptr0			%val = load { i8, i32 }, ptr addrspace(1) %ptr0
	call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)			call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_byval_struct_i8_i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_byval_struct_i8_i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
				; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
				; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: s_getpc_b64 s[34:35]
				; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
				; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8
				; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: s_getpc_b64 s[0:1]
				; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
				; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b8 off, v0, s33			; GFX11-NEXT: scratch_store_b8 off, v0, s33
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4
	; GFX11-NEXT: v_mov_b32_e32 v0, s33			; GFX11-NEXT: v_mov_b32_e32 v0, s33
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
				; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
				; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = alloca { i8, i32 }, align 4, addrspace(5)			%val = alloca { i8, i32 }, align 4, addrspace(5)
	%gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 1
	store i8 3, ptr addrspace(5) %gep0			store i8 3, ptr addrspace(5) %gep0
	store i32 8, ptr addrspace(5) %gep1			store i32 8, ptr addrspace(5) %gep1
	call amdgpu_gfx void @external_void_func_byval_struct_i8_i32(ptr addrspace(5) byval({ i8, i32 }) %val)			call amdgpu_gfx void @external_void_func_byval_struct_i8_i32(ptr addrspace(5) byval({ i8, i32 }) %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_add_u32_e32 v0, 8, v0			; GFX9-NEXT: v_add_u32_e32 v0, 8, v0
	; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_byte v[0:1], v0, off			; GFX9-NEXT: global_store_byte v[0:1], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dword v[0:1], v1, off			; GFX9-NEXT: global_store_dword v[0:1], v1, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: s_getpc_b64 s[34:35]
				; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
				; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_byte v[0:1], v0, off			; GFX10-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: global_store_dword v[0:1], v1, off			; GFX10-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:16 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:16
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX11-NEXT: s_add_i32 s2, s33, 8			; GFX11-NEXT: s_add_i32 s2, s33, 8
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b8 off, v0, s33			; GFX11-NEXT: scratch_store_b8 off, v0, s33
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4
	; GFX11-NEXT: v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s33			; GFX11-NEXT: v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s33
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_u8 v0, off, s33 offset:8			; GFX11-NEXT: scratch_load_u8 v0, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v1, off, s33 offset:12
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_store_b8 v[0:1], v0, off dlc			; GFX11-NEXT: global_store_b8 v[0:1], v0, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: global_store_b32 v[0:1], v1, off dlc			; GFX11-NEXT: global_store_b32 v[0:1], v1, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:16 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:16 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:20 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_add_i32 s2, s33, 8			; GFX10-SCRATCH-NEXT: s_add_i32 s2, s33, 8
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8			; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8
	; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12			; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off			; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off			; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:16 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:16
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:20
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%in.val = alloca { i8, i32 }, align 4, addrspace(5)			%in.val = alloca { i8, i32 }, align 4, addrspace(5)
	%out.val = alloca { i8, i32 }, align 4, addrspace(5)			%out.val = alloca { i8, i32 }, align 4, addrspace(5)
	Show All 15 Lines
	define amdgpu_gfx void @test_call_external_void_func_v16i8() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i8() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i8:			; GFX9-LABEL: test_call_external_void_func_v16i8:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	Show All 16 Lines
	; GFX9-NEXT: v_mov_b32_e32 v8, v2			; GFX9-NEXT: v_mov_b32_e32 v8, v2
	; GFX9-NEXT: v_mov_b32_e32 v12, v3			; GFX9-NEXT: v_mov_b32_e32 v12, v3
	; GFX9-NEXT: v_mov_b32_e32 v1, v16			; GFX9-NEXT: v_mov_b32_e32 v1, v16
	; GFX9-NEXT: v_mov_b32_e32 v2, v17			; GFX9-NEXT: v_mov_b32_e32 v2, v17
	; GFX9-NEXT: v_mov_b32_e32 v3, v18			; GFX9-NEXT: v_mov_b32_e32 v3, v18
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i8:			; GFX10-LABEL: test_call_external_void_func_v16i8:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	Show All 13 Lines
	; GFX10-NEXT: v_mov_b32_e32 v8, v2			; GFX10-NEXT: v_mov_b32_e32 v8, v2
	; GFX10-NEXT: v_mov_b32_e32 v12, v3			; GFX10-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-NEXT: v_mov_b32_e32 v1, v16			; GFX10-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-NEXT: v_mov_b32_e32 v2, v17			; GFX10-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-NEXT: v_mov_b32_e32 v3, v18			; GFX10-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i8:			; GFX11-LABEL: test_call_external_void_func_v16i8:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b128 v[0:3], v0, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v0, s[0:1]
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_lshrrev_b32_e32 v16, 8, v0			; GFX11-NEXT: v_lshrrev_b32_e32 v16, 8, v0
	Show All 10 Lines
	; GFX11-NEXT: v_lshrrev_b32_e32 v15, 24, v3			; GFX11-NEXT: v_lshrrev_b32_e32 v15, 24, v3
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v8, v2			; GFX11-NEXT: v_mov_b32_e32 v8, v2
	; GFX11-NEXT: v_dual_mov_b32 v12, v3 :: v_dual_mov_b32 v3, v18			; GFX11-NEXT: v_dual_mov_b32 v12, v3 :: v_dual_mov_b32 v3, v18
	; GFX11-NEXT: v_dual_mov_b32 v1, v16 :: v_dual_mov_b32 v2, v17			; GFX11-NEXT: v_dual_mov_b32 v1, v16 :: v_dual_mov_b32 v2, v17
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, v2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, v2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <16 x i8>, ptr addrspace(1) %ptr			%val = load <16 x i8>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)			call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)
	ret void			ret void
	}			}

	define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {			define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {
	; GFX9-LABEL: tail_call_byval_align16:			; GFX9-LABEL: tail_call_byval_align16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s8, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s35, v40, 3			; GFX9-NEXT: v_readlane_b32 s35, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s8			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: tail_call_byval_align16:			; GFX10-LABEL: tail_call_byval_align16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s7, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_clause 0x2			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20			; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
	; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: s_mov_b32 s33, s7			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: tail_call_byval_align16:			; GFX11-LABEL: tail_call_byval_align16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s5, s33			; GFX11-NEXT: s_mov_b32 s4, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:24 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:24 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33 offset:16			; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33 offset:16
	; GFX11-NEXT: scratch_load_b32 v31, off, s33			; GFX11-NEXT: scratch_load_b32 v31, off, s33
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s35, v40, 3			; GFX11-NEXT: v_readlane_b32 s35, v40, 3
	; GFX11-NEXT: v_readlane_b32 s34, v40, 2			; GFX11-NEXT: v_readlane_b32 s34, v40, 2
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:24 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:24 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: s_mov_b32 s33, s5			; GFX11-NEXT: s_mov_b32 s33, s4
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: tail_call_byval_align16:			; GFX10-SCRATCH-LABEL: tail_call_byval_align16:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:24 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:24 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33 offset:16			; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33 offset:16
	; GFX10-SCRATCH-NEXT: scratch_load_dword v31, off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dword v31, off, s33
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:24 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:24 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s5			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s4
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%alloca = alloca double, align 8, addrspace(5)			%alloca = alloca double, align 8, addrspace(5)
	tail call amdgpu_gfx void @byval_align16_f64_arg(<32 x i32> %val, ptr addrspace(5) byval(double) align 16 %alloca)			tail call amdgpu_gfx void @byval_align16_f64_arg(<32 x i32> %val, ptr addrspace(5) byval(double) align 16 %alloca)
	ret void			ret void
	}			}

	; inreg arguments are put in sgprs			; inreg arguments are put in sgprs
	define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 1			; GFX11-NEXT: v_mov_b32_e32 v0, 1
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)			call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 3
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)			call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 3
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)			call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s4, 42			; GFX9-NEXT: s_mov_b32 s4, 42
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_mov_b32 s4, 42
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 42
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_mov_b32 s4, 42
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 42
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)			call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: s_mov_b32 s5, 0			; GFX9-NEXT: s_mov_b32 s5, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0			; GFX10-NEXT: s_mov_b32 s5, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 0			; GFX11-NEXT: s_mov_b32 s5, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)			call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 6
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 6
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_mov_b64 s[0:1], 0			; GFX11-NEXT: s_mov_b64 s[0:1], 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, ptr addrspace(4) null			%val = load <2 x i64>, ptr addrspace(4) null
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 6
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 6
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2			; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 6
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 8
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 6			; GFX9-NEXT: v_writelane_b32 v40, s30, 6
	; GFX9-NEXT: s_mov_b32 s8, 1			; GFX9-NEXT: s_mov_b32 s8, 1
	; GFX9-NEXT: s_mov_b32 s9, 2			; GFX9-NEXT: s_mov_b32 s9, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 7			; GFX9-NEXT: v_writelane_b32 v40, s31, 7
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 7			; GFX9-NEXT: v_readlane_b32 s31, v40, 7
	; GFX9-NEXT: v_readlane_b32 s30, v40, 6			; GFX9-NEXT: v_readlane_b32 s30, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 8
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 8
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 1			; GFX10-NEXT: s_mov_b32 s8, 1
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 2			; GFX10-NEXT: s_mov_b32 s9, 2
	; GFX10-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 8
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 8
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_mov_b64 s[0:1], 0			; GFX11-NEXT: s_mov_b64 s[0:1], 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 1			; GFX11-NEXT: s_mov_b32 s8, 1
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: s_mov_b32 s9, 2			; GFX11-NEXT: s_mov_b32 s9, 2
	; GFX11-NEXT: v_writelane_b32 v40, s30, 6			; GFX11-NEXT: v_writelane_b32 v40, s30, 6
	; GFX11-NEXT: v_writelane_b32 v40, s31, 7			; GFX11-NEXT: v_writelane_b32 v40, s31, 7
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 7			; GFX11-NEXT: v_readlane_b32 s31, v40, 7
	; GFX11-NEXT: v_readlane_b32 s30, v40, 6			; GFX11-NEXT: v_readlane_b32 s30, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 8
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 8
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 8
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(4) null			%load = load <2 x i64>, ptr addrspace(4) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	Show All 12 Lines
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 10
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 10
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	Show All 12 Lines
	; GFX10-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 10
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 10
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_mov_b64 s[0:1], 0			; GFX11-NEXT: s_mov_b64 s[0:1], 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	Show All 13 Lines
	; GFX11-NEXT: v_readlane_b32 s11, v40, 7			; GFX11-NEXT: v_readlane_b32 s11, v40, 7
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 10
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 10
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	Show All 12 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(4) null			%load = load <2 x i64>, ptr addrspace(4) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x4400			; GFX9-NEXT: s_movk_i32 s4, 0x4400
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_movk_i32 s4, 0x4400
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x4400
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 3
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)			call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s4, 4.0			; GFX9-NEXT: s_mov_b32 s4, 4.0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_mov_b32 s4, 4.0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_mov_b32 s4, 4.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 4.0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)			call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0			; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 5
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 3			; GFX9-NEXT: v_writelane_b32 v40, s30, 3
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 4			; GFX9-NEXT: v_writelane_b32 v40, s31, 4
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 4			; GFX9-NEXT: v_readlane_b32 s31, v40, 4
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 5
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 5
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 5
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 5
	; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0			; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 4.0			; GFX11-NEXT: s_mov_b32 s6, 4.0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 3			; GFX11-NEXT: v_writelane_b32 v40, s30, 3
	; GFX11-NEXT: v_writelane_b32 v40, s31, 4			; GFX11-NEXT: v_writelane_b32 v40, s31, 4
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 4			; GFX11-NEXT: v_readlane_b32 s31, v40, 4
	; GFX11-NEXT: v_readlane_b32 s30, v40, 3			; GFX11-NEXT: v_readlane_b32 s30, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 5
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 5
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 7
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 5			; GFX9-NEXT: v_writelane_b32 v40, s30, 5
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: s_mov_b32 s7, -1.0			; GFX9-NEXT: s_mov_b32 s7, -1.0
	; GFX9-NEXT: s_mov_b32 s8, 0.5			; GFX9-NEXT: s_mov_b32 s8, 0.5
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 6			; GFX9-NEXT: v_writelane_b32 v40, s31, 6
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 6			; GFX9-NEXT: v_readlane_b32 s31, v40, 6
	; GFX9-NEXT: v_readlane_b32 s30, v40, 5			; GFX9-NEXT: v_readlane_b32 s30, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 7
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 7
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, -1.0			; GFX10-NEXT: s_mov_b32 s7, -1.0
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 0.5			; GFX10-NEXT: s_mov_b32 s8, 0.5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 7
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 7
	; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0			; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 4.0			; GFX11-NEXT: s_mov_b32 s6, 4.0
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, -1.0			; GFX11-NEXT: s_mov_b32 s7, -1.0
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 0.5			; GFX11-NEXT: s_mov_b32 s8, 0.5
	; GFX11-NEXT: v_writelane_b32 v40, s30, 5			; GFX11-NEXT: v_writelane_b32 v40, s30, 5
	; GFX11-NEXT: v_writelane_b32 v40, s31, 6			; GFX11-NEXT: v_writelane_b32 v40, s31, 6
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 6			; GFX11-NEXT: v_readlane_b32 s31, v40, 6
	; GFX11-NEXT: v_readlane_b32 s30, v40, 5			; GFX11-NEXT: v_readlane_b32 s30, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 7
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 7
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 7
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 0x40100000			; GFX9-NEXT: s_mov_b32 s5, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0x40100000			; GFX10-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 0x40100000			; GFX11-NEXT: s_mov_b32 s5, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)			call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 6
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 6
	; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0			; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 0			; GFX11-NEXT: s_mov_b32 s6, 0
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 0x40100000			; GFX11-NEXT: s_mov_b32 s7, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 6
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 8
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 6			; GFX9-NEXT: v_writelane_b32 v40, s30, 6
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: s_mov_b32 s8, 0			; GFX9-NEXT: s_mov_b32 s8, 0
	; GFX9-NEXT: s_mov_b32 s9, 0x40200000			; GFX9-NEXT: s_mov_b32 s9, 0x40200000
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 7			; GFX9-NEXT: v_writelane_b32 v40, s31, 7
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 7			; GFX9-NEXT: v_readlane_b32 s31, v40, 7
	; GFX9-NEXT: v_readlane_b32 s30, v40, 6			; GFX9-NEXT: v_readlane_b32 s30, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 8
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 8
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 0			; GFX10-NEXT: s_mov_b32 s8, 0
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 8
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 8
	; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0			; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 0			; GFX11-NEXT: s_mov_b32 s6, 0
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 0x40100000			; GFX11-NEXT: s_mov_b32 s7, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 0			; GFX11-NEXT: s_mov_b32 s8, 0
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: s_mov_b32 s9, 0x40200000			; GFX11-NEXT: s_mov_b32 s9, 0x40200000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 6			; GFX11-NEXT: v_writelane_b32 v40, s30, 6
	; GFX11-NEXT: v_writelane_b32 v40, s31, 7			; GFX11-NEXT: v_writelane_b32 v40, s31, 7
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 7			; GFX11-NEXT: v_readlane_b32 s31, v40, 7
	; GFX11-NEXT: v_readlane_b32 s30, v40, 6			; GFX11-NEXT: v_readlane_b32 s30, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 8
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 8
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 8
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
				; GFX10-NEXT: v_writelane_b32 v40, s34, 3
				; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 3
				; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0			; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0
	; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 3
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, ptr addrspace(4) undef			%val = load <2 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, ptr addrspace(4) undef			%val = load <3 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, ptr addrspace(4) undef			%val = load <3 x half>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 3			; GFX9-NEXT: s_mov_b32 s5, 3
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 3			; GFX10-NEXT: s_mov_b32 s5, 3
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_mov_b32 s4, 0x20001
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0x20001
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 3			; GFX11-NEXT: s_mov_b32 s5, 3
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x40003c00			; GFX9-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX9-NEXT: s_movk_i32 s5, 0x4400			; GFX9-NEXT: s_movk_i32 s5, 0x4400
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_movk_i32 s5, 0x4400			; GFX10-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_movk_i32 s5, 0x4400			; GFX11-NEXT: s_movk_i32 s5, 0x4400
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400			; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, ptr addrspace(4) undef			%val = load <4 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 0x40003			; GFX9-NEXT: s_mov_b32 s5, 0x40003
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0x40003			; GFX10-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_mov_b32 s4, 0x20001
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0x20001
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 0x40003			; GFX11-NEXT: s_mov_b32 s5, 0x40003
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
				; GFX10-NEXT: v_writelane_b32 v40, s34, 3
				; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 3
				; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0			; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0
	; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 3
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, ptr addrspace(4) undef			%val = load <2 x half>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, ptr addrspace(4) undef			%val = load <2 x i32>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2			; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 5
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 3			; GFX9-NEXT: v_writelane_b32 v40, s30, 3
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 4			; GFX9-NEXT: v_writelane_b32 v40, s31, 4
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 4			; GFX9-NEXT: v_readlane_b32 s31, v40, 4
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 5
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 5
	; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 4			; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 5
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 5
	; GFX11-NEXT: s_mov_b32 s4, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 3
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 4			; GFX11-NEXT: s_mov_b32 s5, 4
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 5			; GFX11-NEXT: s_mov_b32 s6, 5
	; GFX11-NEXT: v_writelane_b32 v40, s30, 3			; GFX11-NEXT: v_writelane_b32 v40, s30, 3
	; GFX11-NEXT: v_writelane_b32 v40, s31, 4			; GFX11-NEXT: v_writelane_b32 v40, s31, 4
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 4			; GFX11-NEXT: v_readlane_b32 s31, v40, 4
	; GFX11-NEXT: v_readlane_b32 s30, v40, 3			; GFX11-NEXT: v_readlane_b32 s30, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 5
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 5
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: s_mov_b32 s7, 6			; GFX9-NEXT: s_mov_b32 s7, 6
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 6
	; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 4			; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 6			; GFX10-NEXT: s_mov_b32 s7, 6
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 6
	; GFX11-NEXT: s_mov_b32 s4, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 3
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 4			; GFX11-NEXT: s_mov_b32 s5, 4
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 5			; GFX11-NEXT: s_mov_b32 s6, 5
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 6			; GFX11-NEXT: s_mov_b32 s7, 6
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 6
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)			call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 6
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 6
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 6
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, ptr addrspace(4) undef			%val = load <4 x i32>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 6
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 6
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2			; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 6
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 7
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 5			; GFX9-NEXT: v_writelane_b32 v40, s30, 5
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 6			; GFX9-NEXT: v_writelane_b32 v40, s31, 6
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 6			; GFX9-NEXT: v_readlane_b32 s31, v40, 6
	; GFX9-NEXT: v_readlane_b32 s30, v40, 5			; GFX9-NEXT: v_readlane_b32 s30, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 7
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 7
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 7
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 7
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2			; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 5			; GFX11-NEXT: s_mov_b32 s8, 5
	; GFX11-NEXT: v_writelane_b32 v40, s30, 5			; GFX11-NEXT: v_writelane_b32 v40, s30, 5
	; GFX11-NEXT: v_writelane_b32 v40, s31, 6			; GFX11-NEXT: v_writelane_b32 v40, s31, 6
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 6			; GFX11-NEXT: v_readlane_b32 s31, v40, 6
	; GFX11-NEXT: v_readlane_b32 s30, v40, 5			; GFX11-NEXT: v_readlane_b32 s30, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 7
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 7
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 7
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 10
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 10
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	Show All 9 Lines
	; GFX10-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 10
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 10
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX11-NEXT: v_readlane_b32 s11, v40, 7			; GFX11-NEXT: v_readlane_b32 s11, v40, 7
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 10
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 10
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	Show All 9 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <8 x i32>, ptr addrspace(4) %ptr			%val = load <8 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 8			; GFX9-NEXT: v_writelane_b32 v40, s30, 8
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	; GFX9-NEXT: s_mov_b32 s9, 6			; GFX9-NEXT: s_mov_b32 s9, 6
	; GFX9-NEXT: s_mov_b32 s10, 7			; GFX9-NEXT: s_mov_b32 s10, 7
	; GFX9-NEXT: s_mov_b32 s11, 8			; GFX9-NEXT: s_mov_b32 s11, 8
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 9			; GFX9-NEXT: v_writelane_b32 v40, s31, 9
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 9			; GFX9-NEXT: v_readlane_b32 s31, v40, 9
	; GFX9-NEXT: v_readlane_b32 s30, v40, 8			; GFX9-NEXT: v_readlane_b32 s30, v40, 8
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 10
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 10
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	Show All 11 Lines
	; GFX10-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 10
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 10
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2			; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 5			; GFX11-NEXT: s_mov_b32 s8, 5
	Show All 12 Lines
	; GFX11-NEXT: v_readlane_b32 s11, v40, 7			; GFX11-NEXT: v_readlane_b32 s11, v40, 7
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 10
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 10
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	Show All 11 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 18
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: v_writelane_b32 v40, s12, 8			; GFX9-NEXT: v_writelane_b32 v40, s12, 8
	; GFX9-NEXT: v_writelane_b32 v40, s13, 9			; GFX9-NEXT: v_writelane_b32 v40, s13, 9
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s14, 10			; GFX9-NEXT: v_writelane_b32 v40, s14, 10
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s15, 11			; GFX9-NEXT: v_writelane_b32 v40, s15, 11
	; GFX9-NEXT: v_writelane_b32 v40, s16, 12			; GFX9-NEXT: v_writelane_b32 v40, s16, 12
	; GFX9-NEXT: v_writelane_b32 v40, s17, 13			; GFX9-NEXT: v_writelane_b32 v40, s17, 13
	; GFX9-NEXT: v_writelane_b32 v40, s18, 14			; GFX9-NEXT: v_writelane_b32 v40, s18, 14
	; GFX9-NEXT: v_writelane_b32 v40, s19, 15			; GFX9-NEXT: v_writelane_b32 v40, s19, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	Show All 18 Lines
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 18
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 18
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	Show All 25 Lines
	; GFX10-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 18
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 18
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: v_writelane_b32 v40, s12, 8			; GFX11-NEXT: v_writelane_b32 v40, s12, 8
	Show All 26 Lines
	; GFX11-NEXT: v_readlane_b32 s11, v40, 7			; GFX11-NEXT: v_readlane_b32 s11, v40, 7
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 18
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 18
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	Show All 25 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 18
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <16 x i32>, ptr addrspace(4) %ptr			%val = load <16 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 28
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: v_writelane_b32 v40, s12, 8			; GFX9-NEXT: v_writelane_b32 v40, s12, 8
	; GFX9-NEXT: v_writelane_b32 v40, s13, 9			; GFX9-NEXT: v_writelane_b32 v40, s13, 9
	; GFX9-NEXT: v_writelane_b32 v40, s14, 10			; GFX9-NEXT: v_writelane_b32 v40, s14, 10
	; GFX9-NEXT: v_writelane_b32 v40, s15, 11			; GFX9-NEXT: v_writelane_b32 v40, s15, 11
	; GFX9-NEXT: v_writelane_b32 v40, s16, 12			; GFX9-NEXT: v_writelane_b32 v40, s16, 12
	; GFX9-NEXT: v_writelane_b32 v40, s17, 13			; GFX9-NEXT: v_writelane_b32 v40, s17, 13
	; GFX9-NEXT: v_writelane_b32 v40, s18, 14			; GFX9-NEXT: v_writelane_b32 v40, s18, 14
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s19, 15			; GFX9-NEXT: v_writelane_b32 v40, s19, 15
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s20, 16			; GFX9-NEXT: v_writelane_b32 v40, s20, 16
	; GFX9-NEXT: v_writelane_b32 v40, s21, 17			; GFX9-NEXT: v_writelane_b32 v40, s21, 17
	; GFX9-NEXT: v_writelane_b32 v40, s22, 18			; GFX9-NEXT: v_writelane_b32 v40, s22, 18
	; GFX9-NEXT: v_writelane_b32 v40, s23, 19			; GFX9-NEXT: v_writelane_b32 v40, s23, 19
	; GFX9-NEXT: v_writelane_b32 v40, s24, 20			; GFX9-NEXT: v_writelane_b32 v40, s24, 20
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 28
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 28
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 28
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 28
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_add_i32 s2, s32, 16			; GFX11-NEXT: s_add_i32 s2, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: v_writelane_b32 v40, s12, 8			; GFX11-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s11, v40, 7			; GFX11-NEXT: v_readlane_b32 s11, v40, 7
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 28
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 28
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_add_i32 s2, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s2, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 28
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <32 x i32>, ptr addrspace(4) %ptr			%val = load <32 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 28
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: v_writelane_b32 v40, s12, 8			; GFX9-NEXT: v_writelane_b32 v40, s12, 8
	; GFX9-NEXT: v_writelane_b32 v40, s13, 9			; GFX9-NEXT: v_writelane_b32 v40, s13, 9
	; GFX9-NEXT: v_writelane_b32 v40, s14, 10			; GFX9-NEXT: v_writelane_b32 v40, s14, 10
	; GFX9-NEXT: v_writelane_b32 v40, s15, 11			; GFX9-NEXT: v_writelane_b32 v40, s15, 11
	; GFX9-NEXT: v_writelane_b32 v40, s16, 12			; GFX9-NEXT: v_writelane_b32 v40, s16, 12
	; GFX9-NEXT: v_writelane_b32 v40, s17, 13			; GFX9-NEXT: v_writelane_b32 v40, s17, 13
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s18, 14			; GFX9-NEXT: v_writelane_b32 v40, s18, 14
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s19, 15			; GFX9-NEXT: v_writelane_b32 v40, s19, 15
	; GFX9-NEXT: v_writelane_b32 v40, s20, 16			; GFX9-NEXT: v_writelane_b32 v40, s20, 16
	; GFX9-NEXT: v_writelane_b32 v40, s21, 17			; GFX9-NEXT: v_writelane_b32 v40, s21, 17
	; GFX9-NEXT: v_writelane_b32 v40, s22, 18			; GFX9-NEXT: v_writelane_b32 v40, s22, 18
	; GFX9-NEXT: v_writelane_b32 v40, s23, 19			; GFX9-NEXT: v_writelane_b32 v40, s23, 19
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 28
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 28
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 28
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 28
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: v_writelane_b32 v40, s12, 8			; GFX11-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s11, v40, 7			; GFX11-NEXT: v_readlane_b32 s11, v40, 7
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 28
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 28
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s11, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 28
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(4), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(4), ptr addrspace(4) undef
	%val0 = load <32 x i32>, ptr addrspace(4) %ptr0			%val0 = load <32 x i32>, ptr addrspace(4) %ptr0
	%val1 = load i32, ptr addrspace(4) undef			%val1 = load i32, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {			define amdgpu_gfx void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {
	; GFX9-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX9-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
				; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
	; GFX10-NEXT: s_waitcnt vmcnt(1)			; GFX10-NEXT: s_waitcnt vmcnt(1)
	; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX11-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33			; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32			; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)			call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_12xv3i32() #0 {			define amdgpu_gfx void @stack_12xv3i32() #0 {
	; GFX9-LABEL: stack_12xv3i32:			; GFX9-LABEL: stack_12xv3i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 12			; GFX9-NEXT: v_mov_b32_e32 v0, 12
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 14			; GFX9-NEXT: v_mov_b32_e32 v0, 14
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 1			; GFX9-NEXT: v_mov_b32_e32 v3, 1
	Show All 20 Lines
	; GFX9-NEXT: v_mov_b32_e32 v24, 8			; GFX9-NEXT: v_mov_b32_e32 v24, 8
	; GFX9-NEXT: v_mov_b32_e32 v25, 8			; GFX9-NEXT: v_mov_b32_e32 v25, 8
	; GFX9-NEXT: v_mov_b32_e32 v26, 8			; GFX9-NEXT: v_mov_b32_e32 v26, 8
	; GFX9-NEXT: v_mov_b32_e32 v27, 9			; GFX9-NEXT: v_mov_b32_e32 v27, 9
	; GFX9-NEXT: v_mov_b32_e32 v28, 9			; GFX9-NEXT: v_mov_b32_e32 v28, 9
	; GFX9-NEXT: v_mov_b32_e32 v29, 9			; GFX9-NEXT: v_mov_b32_e32 v29, 9
	; GFX9-NEXT: v_mov_b32_e32 v30, 10			; GFX9-NEXT: v_mov_b32_e32 v30, 10
	; GFX9-NEXT: v_mov_b32_e32 v31, 11			; GFX9-NEXT: v_mov_b32_e32 v31, 11
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_12xv3i32:			; GFX10-LABEL: stack_12xv3i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
				; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 12			; GFX10-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-NEXT: v_mov_b32_e32 v1, 13			; GFX10-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-NEXT: v_mov_b32_e32 v2, 14			; GFX10-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 15			; GFX10-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	Show All 26 Lines
	; GFX10-NEXT: v_mov_b32_e32 v24, 8			; GFX10-NEXT: v_mov_b32_e32 v24, 8
	; GFX10-NEXT: v_mov_b32_e32 v25, 8			; GFX10-NEXT: v_mov_b32_e32 v25, 8
	; GFX10-NEXT: v_mov_b32_e32 v26, 8			; GFX10-NEXT: v_mov_b32_e32 v26, 8
	; GFX10-NEXT: v_mov_b32_e32 v27, 9			; GFX10-NEXT: v_mov_b32_e32 v27, 9
	; GFX10-NEXT: v_mov_b32_e32 v28, 9			; GFX10-NEXT: v_mov_b32_e32 v28, 9
	; GFX10-NEXT: v_mov_b32_e32 v29, 9			; GFX10-NEXT: v_mov_b32_e32 v29, 9
	; GFX10-NEXT: v_mov_b32_e32 v30, 10			; GFX10-NEXT: v_mov_b32_e32 v30, 10
	; GFX10-NEXT: v_mov_b32_e32 v31, 11			; GFX10-NEXT: v_mov_b32_e32 v31, 11
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_12xv3i32:			; GFX11-LABEL: stack_12xv3i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13			; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13
	; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15			; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 1 :: v_dual_mov_b32 v5, 1			; GFX11-NEXT: v_dual_mov_b32 v4, 1 :: v_dual_mov_b32 v5, 1
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1
	; GFX11-NEXT: v_dual_mov_b32 v6, 2 :: v_dual_mov_b32 v7, 2			; GFX11-NEXT: v_dual_mov_b32 v6, 2 :: v_dual_mov_b32 v7, 2
	; GFX11-NEXT: v_dual_mov_b32 v8, 2 :: v_dual_mov_b32 v9, 3			; GFX11-NEXT: v_dual_mov_b32 v8, 2 :: v_dual_mov_b32 v9, 3
	; GFX11-NEXT: v_dual_mov_b32 v10, 3 :: v_dual_mov_b32 v11, 3			; GFX11-NEXT: v_dual_mov_b32 v10, 3 :: v_dual_mov_b32 v11, 3
	; GFX11-NEXT: v_dual_mov_b32 v12, 4 :: v_dual_mov_b32 v13, 4			; GFX11-NEXT: v_dual_mov_b32 v12, 4 :: v_dual_mov_b32 v13, 4
	; GFX11-NEXT: v_dual_mov_b32 v14, 4 :: v_dual_mov_b32 v15, 5			; GFX11-NEXT: v_dual_mov_b32 v14, 4 :: v_dual_mov_b32 v15, 5
	; GFX11-NEXT: v_dual_mov_b32 v16, 5 :: v_dual_mov_b32 v17, 5			; GFX11-NEXT: v_dual_mov_b32 v16, 5 :: v_dual_mov_b32 v17, 5
	; GFX11-NEXT: v_dual_mov_b32 v18, 6 :: v_dual_mov_b32 v19, 6			; GFX11-NEXT: v_dual_mov_b32 v18, 6 :: v_dual_mov_b32 v19, 6
	; GFX11-NEXT: v_dual_mov_b32 v20, 6 :: v_dual_mov_b32 v21, 7			; GFX11-NEXT: v_dual_mov_b32 v20, 6 :: v_dual_mov_b32 v21, 7
	; GFX11-NEXT: v_dual_mov_b32 v22, 7 :: v_dual_mov_b32 v23, 7			; GFX11-NEXT: v_dual_mov_b32 v22, 7 :: v_dual_mov_b32 v23, 7
	; GFX11-NEXT: v_dual_mov_b32 v24, 8 :: v_dual_mov_b32 v25, 8			; GFX11-NEXT: v_dual_mov_b32 v24, 8 :: v_dual_mov_b32 v25, 8
	; GFX11-NEXT: v_dual_mov_b32 v26, 8 :: v_dual_mov_b32 v27, 9			; GFX11-NEXT: v_dual_mov_b32 v26, 8 :: v_dual_mov_b32 v27, 9
	; GFX11-NEXT: v_dual_mov_b32 v28, 9 :: v_dual_mov_b32 v29, 9			; GFX11-NEXT: v_dual_mov_b32 v28, 9 :: v_dual_mov_b32 v29, 9
	; GFX11-NEXT: v_dual_mov_b32 v30, 10 :: v_dual_mov_b32 v31, 11			; GFX11-NEXT: v_dual_mov_b32 v30, 10 :: v_dual_mov_b32 v31, 11
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_12xv3i32:			; GFX10-SCRATCH-LABEL: stack_12xv3i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
	Show All 23 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v24, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v24, 8
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v25, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v25, 8
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v26, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v26, 8
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v27, 9			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v27, 9
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v28, 9			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v28, 9
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v29, 9			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v29, 9
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v30, 10			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v30, 10
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 11			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 11
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_12xv3i32(			call amdgpu_gfx void @external_void_func_12xv3i32(
	Show All 15 Lines
	define amdgpu_gfx void @stack_8xv5i32() #0 {			define amdgpu_gfx void @stack_8xv5i32() #0 {
	; GFX9-LABEL: stack_8xv5i32:			; GFX9-LABEL: stack_8xv5i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 9			; GFX9-NEXT: v_mov_b32_e32 v0, 9
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 10			; GFX9-NEXT: v_mov_b32_e32 v0, 10
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 11			; GFX9-NEXT: v_mov_b32_e32 v0, 11
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX9-NEXT: v_mov_b32_e32 v0, 12			; GFX9-NEXT: v_mov_b32_e32 v0, 12
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; GFX9-NEXT: v_mov_b32_e32 v0, 14			; GFX9-NEXT: v_mov_b32_e32 v0, 14
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0			; GFX9-NEXT: v_mov_b32_e32 v3, 0
	Show All 20 Lines
	; GFX9-NEXT: v_mov_b32_e32 v24, 4			; GFX9-NEXT: v_mov_b32_e32 v24, 4
	; GFX9-NEXT: v_mov_b32_e32 v25, 5			; GFX9-NEXT: v_mov_b32_e32 v25, 5
	; GFX9-NEXT: v_mov_b32_e32 v26, 5			; GFX9-NEXT: v_mov_b32_e32 v26, 5
	; GFX9-NEXT: v_mov_b32_e32 v27, 5			; GFX9-NEXT: v_mov_b32_e32 v27, 5
	; GFX9-NEXT: v_mov_b32_e32 v28, 5			; GFX9-NEXT: v_mov_b32_e32 v28, 5
	; GFX9-NEXT: v_mov_b32_e32 v29, 5			; GFX9-NEXT: v_mov_b32_e32 v29, 5
	; GFX9-NEXT: v_mov_b32_e32 v30, 6			; GFX9-NEXT: v_mov_b32_e32 v30, 6
	; GFX9-NEXT: v_mov_b32_e32 v31, 7			; GFX9-NEXT: v_mov_b32_e32 v31, 7
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5i32:			; GFX10-LABEL: stack_8xv5i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_mov_b32_e32 v0, 8			; GFX10-NEXT: v_mov_b32_e32 v0, 8
	; GFX10-NEXT: v_mov_b32_e32 v1, 9			; GFX10-NEXT: v_mov_b32_e32 v1, 9
	; GFX10-NEXT: v_mov_b32_e32 v2, 10			; GFX10-NEXT: v_mov_b32_e32 v2, 10
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 14			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 11			; GFX10-NEXT: v_mov_b32_e32 v0, 11
	; GFX10-NEXT: v_mov_b32_e32 v1, 12			; GFX10-NEXT: v_mov_b32_e32 v1, 12
	; GFX10-NEXT: v_mov_b32_e32 v2, 13			; GFX10-NEXT: v_mov_b32_e32 v2, 13
				; GFX10-NEXT: v_mov_b32_e32 v3, 14
	; GFX10-NEXT: v_mov_b32_e32 v4, 15			; GFX10-NEXT: v_mov_b32_e32 v4, 15
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24
	; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28			; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	Show All 23 Lines
	; GFX10-NEXT: v_mov_b32_e32 v24, 4			; GFX10-NEXT: v_mov_b32_e32 v24, 4
	; GFX10-NEXT: v_mov_b32_e32 v25, 5			; GFX10-NEXT: v_mov_b32_e32 v25, 5
	; GFX10-NEXT: v_mov_b32_e32 v26, 5			; GFX10-NEXT: v_mov_b32_e32 v26, 5
	; GFX10-NEXT: v_mov_b32_e32 v27, 5			; GFX10-NEXT: v_mov_b32_e32 v27, 5
	; GFX10-NEXT: v_mov_b32_e32 v28, 5			; GFX10-NEXT: v_mov_b32_e32 v28, 5
	; GFX10-NEXT: v_mov_b32_e32 v29, 5			; GFX10-NEXT: v_mov_b32_e32 v29, 5
	; GFX10-NEXT: v_mov_b32_e32 v30, 6			; GFX10-NEXT: v_mov_b32_e32 v30, 6
	; GFX10-NEXT: v_mov_b32_e32 v31, 7			; GFX10-NEXT: v_mov_b32_e32 v31, 7
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5i32:			; GFX11-LABEL: stack_8xv5i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 8 :: v_dual_mov_b32 v1, 9			; GFX11-NEXT: v_dual_mov_b32 v0, 8 :: v_dual_mov_b32 v1, 9
	; GFX11-NEXT: v_dual_mov_b32 v2, 10 :: v_dual_mov_b32 v3, 11			; GFX11-NEXT: v_dual_mov_b32 v2, 10 :: v_dual_mov_b32 v3, 11
	; GFX11-NEXT: v_dual_mov_b32 v4, 12 :: v_dual_mov_b32 v5, 13			; GFX11-NEXT: v_dual_mov_b32 v4, 12 :: v_dual_mov_b32 v5, 13
	; GFX11-NEXT: v_dual_mov_b32 v6, 14 :: v_dual_mov_b32 v7, 15			; GFX11-NEXT: v_dual_mov_b32 v6, 14 :: v_dual_mov_b32 v7, 15
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: scratch_store_b128 off, v[4:7], s0			; GFX11-NEXT: scratch_store_b128 off, v[4:7], s0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1			; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1
	; GFX11-NEXT: v_dual_mov_b32 v6, 1 :: v_dual_mov_b32 v7, 1			; GFX11-NEXT: v_dual_mov_b32 v6, 1 :: v_dual_mov_b32 v7, 1
	; GFX11-NEXT: v_dual_mov_b32 v8, 1 :: v_dual_mov_b32 v9, 1			; GFX11-NEXT: v_dual_mov_b32 v8, 1 :: v_dual_mov_b32 v9, 1
	; GFX11-NEXT: v_dual_mov_b32 v10, 2 :: v_dual_mov_b32 v11, 2			; GFX11-NEXT: v_dual_mov_b32 v10, 2 :: v_dual_mov_b32 v11, 2
	Show All 10 Lines
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5i32:			; GFX10-SCRATCH-LABEL: stack_8xv5i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 8
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 9			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 9
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 10			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 10
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 11			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 11
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 12			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 15
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s0, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: s_add_i32 s0, s32, 16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s0			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
	Show All 25 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 7			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5i32(			call amdgpu_gfx void @external_void_func_8xv5i32(
	Show All 11 Lines
	define amdgpu_gfx void @stack_8xv5f32() #0 {			define amdgpu_gfx void @stack_8xv5f32() #0 {
	; GFX9-LABEL: stack_8xv5f32:			; GFX9-LABEL: stack_8xv5f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0			; GFX9-NEXT: v_mov_b32_e32 v3, 0
	Show All 20 Lines
	; GFX9-NEXT: v_mov_b32_e32 v24, 4.0			; GFX9-NEXT: v_mov_b32_e32 v24, 4.0
	; GFX9-NEXT: v_mov_b32_e32 v25, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v25, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v26, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v26, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v27, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v27, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v28, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v28, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v29, 0x40a00000			; GFX9-NEXT: v_mov_b32_e32 v29, 0x40a00000
	; GFX9-NEXT: v_mov_b32_e32 v30, 0x40c00000			; GFX9-NEXT: v_mov_b32_e32 v30, 0x40c00000
	; GFX9-NEXT: v_mov_b32_e32 v31, 0x40e00000			; GFX9-NEXT: v_mov_b32_e32 v31, 0x40e00000
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5f32:			; GFX10-LABEL: stack_8xv5f32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000
				; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000
	; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000			; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24
	; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28			; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	Show All 23 Lines
	; GFX10-NEXT: v_mov_b32_e32 v24, 4.0			; GFX10-NEXT: v_mov_b32_e32 v24, 4.0
	; GFX10-NEXT: v_mov_b32_e32 v25, 0x40a00000			; GFX10-NEXT: v_mov_b32_e32 v25, 0x40a00000
	; GFX10-NEXT: v_mov_b32_e32 v26, 0x40a00000			; GFX10-NEXT: v_mov_b32_e32 v26, 0x40a00000
	; GFX10-NEXT: v_mov_b32_e32 v27, 0x40a00000			; GFX10-NEXT: v_mov_b32_e32 v27, 0x40a00000
	; GFX10-NEXT: v_mov_b32_e32 v28, 0x40a00000			; GFX10-NEXT: v_mov_b32_e32 v28, 0x40a00000
	; GFX10-NEXT: v_mov_b32_e32 v29, 0x40a00000			; GFX10-NEXT: v_mov_b32_e32 v29, 0x40a00000
	; GFX10-NEXT: v_mov_b32_e32 v30, 0x40c00000			; GFX10-NEXT: v_mov_b32_e32 v30, 0x40c00000
	; GFX10-NEXT: v_mov_b32_e32 v31, 0x40e00000			; GFX10-NEXT: v_mov_b32_e32 v31, 0x40e00000
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5f32:			; GFX11-LABEL: stack_8xv5f32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
				; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX11-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x41100000			; GFX11-NEXT: v_mov_b32_e32 v1, 0x41100000
	; GFX11-NEXT: v_mov_b32_e32 v2, 0x41200000			; GFX11-NEXT: v_mov_b32_e32 v2, 0x41200000
	; GFX11-NEXT: v_mov_b32_e32 v3, 0x41300000			; GFX11-NEXT: v_mov_b32_e32 v3, 0x41300000
	; GFX11-NEXT: v_mov_b32_e32 v4, 0x41400000			; GFX11-NEXT: v_mov_b32_e32 v4, 0x41400000
	; GFX11-NEXT: v_mov_b32_e32 v5, 0x41500000			; GFX11-NEXT: v_mov_b32_e32 v5, 0x41500000
	; GFX11-NEXT: v_mov_b32_e32 v6, 0x41600000			; GFX11-NEXT: v_mov_b32_e32 v6, 0x41600000
	; GFX11-NEXT: v_mov_b32_e32 v7, 0x41700000			; GFX11-NEXT: v_mov_b32_e32 v7, 0x41700000
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: scratch_store_b128 off, v[4:7], s0			; GFX11-NEXT: scratch_store_b128 off, v[4:7], s0
	; GFX11-NEXT: v_mov_b32_e32 v6, 1.0			; GFX11-NEXT: v_mov_b32_e32 v6, 1.0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1.0			; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1.0
	; GFX11-NEXT: v_dual_mov_b32 v7, 1.0 :: v_dual_mov_b32 v8, 1.0			; GFX11-NEXT: v_dual_mov_b32 v7, 1.0 :: v_dual_mov_b32 v8, 1.0
	; GFX11-NEXT: v_dual_mov_b32 v9, 1.0 :: v_dual_mov_b32 v10, 2.0			; GFX11-NEXT: v_dual_mov_b32 v9, 1.0 :: v_dual_mov_b32 v10, 2.0
	Show All 12 Lines
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5f32:			; GFX10-SCRATCH-LABEL: stack_8xv5f32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s0, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41100000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41200000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41300000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41300000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41400000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41400000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41500000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41500000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41600000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41600000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41700000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41700000
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s0, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: s_add_i32 s0, s32, 16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s0			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1.0
	Show All 25 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 0x40e00000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 0x40e00000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5f32(			call amdgpu_gfx void @external_void_func_8xv5f32(
	Show All 23 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX11 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX11 %s

	declare hidden amdgpu_gfx void @external_void_func_void() #0			declare hidden amdgpu_gfx void @external_void_func_void() #0

	define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {			define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
	; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 4
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX11-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 4
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_getpc_b64 s[4:5]			; GFX11-NEXT: s_getpc_b64 s[4:5]
	; GFX11-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "", ""() #0			call void asm sideeffect "", ""() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: void_func_void_clobber_s28_s29:			; GFX11-LABEL: void_func_void_clobber_s28_s29:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_xor_saveexec_b32 s1, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v0, s28, 0			; GFX11-NEXT: v_writelane_b32 v0, s28, 0
	; GFX11-NEXT: v_writelane_b32 v0, s29, 1			; GFX11-NEXT: v_writelane_b32 v0, s29, 1
	; GFX11-NEXT: v_writelane_b32 v0, s30, 2			; GFX11-NEXT: v_writelane_b32 v0, s30, 2
	; GFX11-NEXT: v_writelane_b32 v0, s31, 3			; GFX11-NEXT: v_writelane_b32 v0, s31, 3
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; clobber			; GFX11-NEXT: ; clobber
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; clobber			; GFX11-NEXT: ; clobber
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v0, 3			; GFX11-NEXT: v_readlane_b32 s31, v0, 3
	; GFX11-NEXT: v_readlane_b32 s30, v0, 2			; GFX11-NEXT: v_readlane_b32 s30, v0, 2
	; GFX11-NEXT: v_readlane_b32 s29, v0, 1			; GFX11-NEXT: v_readlane_b32 s29, v0, 1
	; GFX11-NEXT: v_readlane_b32 s28, v0, 0			; GFX11-NEXT: v_readlane_b32 s28, v0, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s1, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call void asm sideeffect "; clobber", "~{s[30:31]}"() #0			call void asm sideeffect "; clobber", "~{s[30:31]}"() #0
	call void asm sideeffect "; clobber", "~{s[28:29]}"() #0			call void asm sideeffect "; clobber", "~{s[28:29]}"() #0
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(ptr addrspace(1) %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(ptr addrspace(1) %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s31			; GFX9-NEXT: ; def s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s31			; GFX9-NEXT: s_mov_b32 s4, s31
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s31, s4			; GFX9-NEXT: s_mov_b32 s31, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s31			; GFX9-NEXT: ; use s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s31			; GFX10-NEXT: ; def s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_mov_b32 s4, s31			; GFX10-NEXT: s_mov_b32 s4, s31
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_mov_b32 s31, s4			; GFX10-NEXT: s_mov_b32 s31, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s31			; GFX10-NEXT: ; use s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX11-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s31			; GFX11-NEXT: ; def s31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_mov_b32 s4, s31			; GFX11-NEXT: s_mov_b32 s4, s31
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_mov_b32 s31, s4			; GFX11-NEXT: s_mov_b32 s31, s4
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s31			; GFX11-NEXT: ; use s31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s31 = call i32 asm sideeffect "; def $0", "={s31}"()			%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s31}"(i32 %s31)			call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(ptr addrspace(1) %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(ptr addrspace(1) %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v42, s34, 0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v31			; GFX9-NEXT: ; def v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v41, v31			; GFX9-NEXT: v_mov_b32_e32 v41, v31
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v31, v41			; GFX9-NEXT: v_mov_b32_e32 v31, v41
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v31			; GFX9-NEXT: ; use v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v42, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v42, s34, 0
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v31			; GFX10-NEXT: ; def v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_mov_b32_e32 v41, v31			; GFX10-NEXT: v_mov_b32_e32 v41, v31
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_mov_b32_e32 v31, v41			; GFX10-NEXT: v_mov_b32_e32 v31, v41
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v31			; GFX10-NEXT: ; use v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v42, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:4 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v42, s0, 0
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def v31			; GFX11-NEXT: ; def v31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: v_mov_b32_e32 v41, v31			; GFX11-NEXT: v_mov_b32_e32 v41, v31
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v31, v41			; GFX11-NEXT: v_mov_b32_e32 v31, v41
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v31			; GFX11-NEXT: ; use v31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v42, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%v31 = call i32 asm sideeffect "; def $0", "={v31}"()			%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v31}"(i32 %v31)			call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
	ret void			ret void
	}			}


	define amdgpu_gfx void @test_call_void_func_void_preserves_s33(ptr addrspace(1) %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_s33(ptr addrspace(1) %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_s33:			; GFX9-LABEL: test_call_void_func_void_preserves_s33:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s33			; GFX9-NEXT: ; def s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s33			; GFX9-NEXT: ; use s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s33:			; GFX10-LABEL: test_call_void_func_void_preserves_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s33			; GFX10-NEXT: ; def s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s33			; GFX10-NEXT: ; use s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_s33:			; GFX11-LABEL: test_call_void_func_void_preserves_s33:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s33			; GFX11-NEXT: ; def s33
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, s33			; GFX11-NEXT: s_mov_b32 s4, s33
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_mov_b32 s33, s4			; GFX11-NEXT: s_mov_b32 s33, s4
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s33			; GFX11-NEXT: ; use s33
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s33 = call i32 asm sideeffect "; def $0", "={s33}"()			%s33 = call i32 asm sideeffect "; def $0", "={s33}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s33}"(i32 %s33)			call void asm sideeffect "; use $0", "{s33}"(i32 %s33)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_s34(ptr addrspace(1) %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_s34(ptr addrspace(1) %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_s34:			; GFX9-LABEL: test_call_void_func_void_preserves_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s34			; GFX9-NEXT: ; def s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_mov_b32 s4, s34			; GFX9-NEXT: s_mov_b32 s4, s34
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s34, s4			; GFX9-NEXT: s_mov_b32 s34, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s34			; GFX9-NEXT: ; use s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s34:			; GFX10-LABEL: test_call_void_func_void_preserves_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[36:37]			; GFX10-NEXT: s_getpc_b64 s[36:37]
	; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s34			; GFX10-NEXT: ; def s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s34			; GFX10-NEXT: s_mov_b32 s4, s34
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX10-NEXT: s_mov_b32 s34, s4			; GFX10-NEXT: s_mov_b32 s34, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s34			; GFX10-NEXT: ; use s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_s34:			; GFX11-LABEL: test_call_void_func_void_preserves_s34:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s34			; GFX11-NEXT: ; def s34
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, s34			; GFX11-NEXT: s_mov_b32 s4, s34
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_mov_b32 s34, s4			; GFX11-NEXT: s_mov_b32 s34, s4
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s34			; GFX11-NEXT: ; use s34
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s34 = call i32 asm sideeffect "; def $0", "={s34}"()			%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s34}"(i32 %s34)			call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_v40(ptr addrspace(1) %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_v40(ptr addrspace(1) %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_v40:			; GFX9-LABEL: test_call_void_func_void_preserves_v40:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v41, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v41, s30, 0			; GFX9-NEXT: v_writelane_b32 v41, s30, 0
	; GFX9-NEXT: v_writelane_b32 v42, s34, 0
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v41, s31, 1			; GFX9-NEXT: v_writelane_b32 v41, s31, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v40			; GFX9-NEXT: ; def v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v40			; GFX9-NEXT: ; use v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v41, 1			; GFX9-NEXT: v_readlane_b32 s31, v41, 1
	; GFX9-NEXT: v_readlane_b32 s30, v41, 0			; GFX9-NEXT: v_readlane_b32 s30, v41, 0
	; GFX9-NEXT: v_readlane_b32 s34, v42, 0			; GFX9-NEXT: v_readlane_b32 s34, v41, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_v40:			; GFX10-LABEL: test_call_void_func_void_preserves_v40:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v41, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v42, s34, 0
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v40			; GFX10-NEXT: ; def v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v41, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v40			; GFX10-NEXT: ; use v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v41, 1			; GFX10-NEXT: v_readlane_b32 s31, v41, 1
	; GFX10-NEXT: v_readlane_b32 s30, v41, 0			; GFX10-NEXT: v_readlane_b32 s30, v41, 0
	; GFX10-NEXT: v_readlane_b32 s34, v42, 0			; GFX10-NEXT: v_readlane_b32 s34, v41, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_v40:			; GFX11-LABEL: test_call_void_func_void_preserves_v40:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v41, s30, 0			; GFX11-NEXT: v_writelane_b32 v41, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v42, s0, 0
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def v40			; GFX11-NEXT: ; def v40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v41, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: v_writelane_b32 v41, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v40			; GFX11-NEXT: ; use v40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v41, 1			; GFX11-NEXT: v_readlane_b32 s31, v41, 1
	; GFX11-NEXT: v_readlane_b32 s30, v41, 0			; GFX11-NEXT: v_readlane_b32 s30, v41, 0
	; GFX11-NEXT: v_readlane_b32 s0, v42, 0			; GFX11-NEXT: v_readlane_b32 s0, v41, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%v40 = call i32 asm sideeffect "; def $0", "={v40}"()			%v40 = call i32 asm sideeffect "; def $0", "={v40}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v40}"(i32 %v40)			call void asm sideeffect "; use $0", "{v40}"(i32 %v40)
	ret void			ret void
	}			}

	define hidden void @void_func_void_clobber_s33() #1 {			define hidden void @void_func_void_clobber_s33() #1 {
	; GFX9-LABEL: void_func_void_clobber_s33:			; GFX9-LABEL: void_func_void_clobber_s33:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v0, s33, 0			; GFX9-NEXT: v_writelane_b32 v0, s33, 0
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; clobber			; GFX9-NEXT: ; clobber
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s33, v0, 0			; GFX9-NEXT: v_readlane_b32 s33, v0, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: void_func_void_clobber_s33:			; GFX10-LABEL: void_func_void_clobber_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_xor_saveexec_b32 s5, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v0, s33, 0			; GFX10-NEXT: v_writelane_b32 v0, s33, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; clobber			; GFX10-NEXT: ; clobber
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s33, v0, 0			; GFX10-NEXT: v_readlane_b32 s33, v0, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s5, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: void_func_void_clobber_s33:			; GFX11-LABEL: void_func_void_clobber_s33:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_xor_saveexec_b32 s1, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v0, s33, 0			; GFX11-NEXT: v_writelane_b32 v0, s33, 0
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; clobber			; GFX11-NEXT: ; clobber
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s33, v0, 0			; GFX11-NEXT: v_readlane_b32 s33, v0, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s1, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call void asm sideeffect "; clobber", "~{s33}"() #0			call void asm sideeffect "; clobber", "~{s33}"() #0
	ret void			ret void
	}			}

	define hidden void @void_func_void_clobber_s34() #1 {			define hidden void @void_func_void_clobber_s34() #1 {
	; GFX9-LABEL: void_func_void_clobber_s34:			; GFX9-LABEL: void_func_void_clobber_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v0, s34, 0			; GFX9-NEXT: v_writelane_b32 v0, s34, 0
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; clobber			; GFX9-NEXT: ; clobber
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s34, v0, 0			; GFX9-NEXT: v_readlane_b32 s34, v0, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: void_func_void_clobber_s34:			; GFX10-LABEL: void_func_void_clobber_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_xor_saveexec_b32 s5, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v0, s34, 0			; GFX10-NEXT: v_writelane_b32 v0, s34, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; clobber			; GFX10-NEXT: ; clobber
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s34, v0, 0			; GFX10-NEXT: v_readlane_b32 s34, v0, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s5, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: void_func_void_clobber_s34:			; GFX11-LABEL: void_func_void_clobber_s34:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_xor_saveexec_b32 s1, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v0, s34, 0			; GFX11-NEXT: v_writelane_b32 v0, s34, 0
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; clobber			; GFX11-NEXT: ; clobber
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s34, v0, 0			; GFX11-NEXT: v_readlane_b32 s34, v0, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s1, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call void asm sideeffect "; clobber", "~{s34}"() #0			call void asm sideeffect "; clobber", "~{s34}"() #0
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {
	; GFX9-LABEL: test_call_void_func_void_clobber_s33:			; GFX9-LABEL: test_call_void_func_void_clobber_s33:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s33:			; GFX10-LABEL: test_call_void_func_void_clobber_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_clobber_s33:			; GFX11-LABEL: test_call_void_func_void_clobber_s33:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s33@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s33@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s33@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s33@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s33()			call amdgpu_gfx void @void_func_void_clobber_s33()
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {
	; GFX9-LABEL: test_call_void_func_void_clobber_s34:			; GFX9-LABEL: test_call_void_func_void_clobber_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s34:			; GFX10-LABEL: test_call_void_func_void_clobber_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 2
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_clobber_s34:			; GFX11-LABEL: test_call_void_func_void_clobber_s34:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s34@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s34@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s34@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s34@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s34()			call amdgpu_gfx void @void_func_void_clobber_s34()
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v41, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s40			; GFX9-NEXT: s_mov_b32 s4, s40
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s4			; GFX9-NEXT: ; use s4
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v41, s34, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s40			; GFX10-NEXT: s_mov_b32 s4, s40
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s4			; GFX10-NEXT: ; use s4
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: callee_saved_sgpr_kernel:			; GFX11-LABEL: callee_saved_sgpr_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v41, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s40			; GFX11-NEXT: ; def s40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, s40			; GFX11-NEXT: s_mov_b32 s4, s40
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s4			; GFX11-NEXT: ; use s4
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s33			; GFX9-NEXT: s_mov_b32 s34, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
				; GFX9-NEXT: v_writelane_b32 v40, s34, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v42, s34, 0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s40			; GFX9-NEXT: s_mov_b32 s4, s40
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v32			; GFX9-NEXT: ; def v32
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v41, v32			; GFX9-NEXT: v_mov_b32_e32 v41, v32
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s4			; GFX9-NEXT: ; use s4
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v41			; GFX9-NEXT: ; use v41
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v42, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 3
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v42, s34, 0
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_mov_b32 s4, s40
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v32			; GFX10-NEXT: ; def v32
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
				; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, s40
	; GFX10-NEXT: v_mov_b32_e32 v41, v32			; GFX10-NEXT: v_mov_b32_e32 v41, v32
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s4			; GFX10-NEXT: ; use s4
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v41			; GFX10-NEXT: ; use v41
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v42, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:4 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 3
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v42, s0, 0
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s40			; GFX11-NEXT: ; def s40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_mov_b32 s4, s40
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def v32			; GFX11-NEXT: ; def v32
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
				; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, s40
	; GFX11-NEXT: v_mov_b32_e32 v41, v32			; GFX11-NEXT: v_mov_b32_e32 v41, v32
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s4			; GFX11-NEXT: ; use s4
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v41			; GFX11-NEXT: ; use v41
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v42, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0			%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	call void asm sideeffect "; use $0", "v"(i32 %v32) #0			call void asm sideeffect "; use $0", "v"(i32 %v32) #0
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind noinline }			attributes #1 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i1:			; GFX11-LABEL: call_i1:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s3, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_i1@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_i1@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_i1@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_i1@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s3			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i1 @return_i1()			call amdgpu_gfx i1 @return_i1()
	ret void			ret void
	}			}

	define amdgpu_gfx i16 @return_i16() #0 {			define amdgpu_gfx i16 @return_i16() #0 {
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i16:			; GFX11-LABEL: call_i16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s3, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_i16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_i16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_i16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_i16@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s3			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i16 @return_i16()			call amdgpu_gfx i16 @return_i16()
	ret void			ret void
	}			}

	define amdgpu_gfx <2 x i16> @return_2xi16() #0 {			define amdgpu_gfx <2 x i16> @return_2xi16() #0 {
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_2xi16:			; GFX11-LABEL: call_2xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s3, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_2xi16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_2xi16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_2xi16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_2xi16@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s3			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <2 x i16> @return_2xi16()			call amdgpu_gfx <2 x i16> @return_2xi16()
	ret void			ret void
	}			}

	define amdgpu_gfx <3 x i16> @return_3xi16() #0 {			define amdgpu_gfx <3 x i16> @return_3xi16() #0 {
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_3xi16:			; GFX11-LABEL: call_3xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s3, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v2, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v2, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_3xi16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_3xi16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_3xi16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_3xi16@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v2, s30, 0			; GFX11-NEXT: v_writelane_b32 v2, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v2, s31, 1			; GFX11-NEXT: v_writelane_b32 v2, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v2, 1			; GFX11-NEXT: v_readlane_b32 s31, v2, 1
	; GFX11-NEXT: v_readlane_b32 s30, v2, 0			; GFX11-NEXT: v_readlane_b32 s30, v2, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v2, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v2, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s3			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <3 x i16> @return_3xi16()			call amdgpu_gfx <3 x i16> @return_3xi16()
	ret void			ret void
	}			}

	; Check that return values that overlap CSRs are correctly handled			; Check that return values that overlap CSRs are correctly handled
	▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_addk_i32 s32, 0xee00			; GFX10-NEXT: s_addk_i32 s32, 0xee00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_100xi32:			; GFX11-LABEL: call_100xi32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s3, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v100, s33 offset:128 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v100, s33 offset:128 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0x90			; GFX11-NEXT: s_addk_i32 s32, 0x90
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_100xi32@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_100xi32@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_100xi32@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_100xi32@gotpcrel32@hi+12
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:120			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:120
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:124			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:124
	; GFX11-NEXT: v_readlane_b32 s31, v100, 1			; GFX11-NEXT: v_readlane_b32 s31, v100, 1
	; GFX11-NEXT: v_readlane_b32 s30, v100, 0			; GFX11-NEXT: v_readlane_b32 s30, v100, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v100, off, s33 offset:128 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v100, off, s33 offset:128 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xff70			; GFX11-NEXT: s_addk_i32 s32, 0xff70
	; GFX11-NEXT: s_mov_b32 s33, s3			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <100 x i32> @return_100xi32()			call amdgpu_gfx <100 x i32> @return_100xi32()
	ret void			ret void
	}			}

	; Check that return values that do not fit in registers do not crash			; Check that return values that do not fit in registers do not crash
	▲ Show 20 Lines • Show All 1,842 Lines • ▼ Show 20 Lines
	define amdgpu_gfx void @call_72xi32() #1 {			define amdgpu_gfx void @call_72xi32() #1 {
	; GFX9-LABEL: call_72xi32:			; GFX9-LABEL: call_72xi32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s36, s33			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_add_i32 s33, s32, 0x7fc0			; GFX9-NEXT: s_add_i32 s33, s32, 0x7fc0
	; GFX9-NEXT: s_and_b32 s33, s33, 0xffff8000			; GFX9-NEXT: s_and_b32 s33, s33, 0xffff8000
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s33 offset:1568 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s33 offset:1536 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_add_i32 s32, s32, 0x28000			; GFX9-NEXT: s_add_i32 s32, s32, 0x28000
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_72xi32@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_72xi32@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_72xi32@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_72xi32@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:580			; GFX9-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:580
	; GFX9-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:584			; GFX9-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:584
	; GFX9-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:588			; GFX9-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:588
	; GFX9-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:592			; GFX9-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:592
	; GFX9-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:596			; GFX9-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:596
	; GFX9-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:600			; GFX9-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:600
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:604			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:604
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1564 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1568 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:608			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:608
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1560 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1564 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:612			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:612
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1556 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1560 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:616			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:616
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1552 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1556 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:620			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:620
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1548 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1552 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:624			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:624
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1544 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1548 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:628			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:628
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1540 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1544 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:632			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:632
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1536 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1540 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:8
	; GFX9-NEXT: buffer_store_dword v34, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v34, off, s[0:3], s32 offset:12
	; GFX9-NEXT: buffer_store_dword v35, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v35, off, s[0:3], s32 offset:16
	; GFX9-NEXT: buffer_store_dword v36, off, s[0:3], s32 offset:20			; GFX9-NEXT: buffer_store_dword v36, off, s[0:3], s32 offset:20
	; GFX9-NEXT: buffer_store_dword v37, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v37, off, s[0:3], s32 offset:24
	; GFX9-NEXT: buffer_store_dword v38, off, s[0:3], s32 offset:28			; GFX9-NEXT: buffer_store_dword v38, off, s[0:3], s32 offset:28
	Show All 33 Lines
	; GFX9-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:160			; GFX9-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:160
	; GFX9-NEXT: v_mov_b32_e32 v2, v24			; GFX9-NEXT: v_mov_b32_e32 v2, v24
	; GFX9-NEXT: v_mov_b32_e32 v3, v25			; GFX9-NEXT: v_mov_b32_e32 v3, v25
	; GFX9-NEXT: v_mov_b32_e32 v4, v26			; GFX9-NEXT: v_mov_b32_e32 v4, v26
	; GFX9-NEXT: v_mov_b32_e32 v5, v27			; GFX9-NEXT: v_mov_b32_e32 v5, v27
	; GFX9-NEXT: v_mov_b32_e32 v6, v28			; GFX9-NEXT: v_mov_b32_e32 v6, v28
	; GFX9-NEXT: v_mov_b32_e32 v7, v29			; GFX9-NEXT: v_mov_b32_e32 v7, v29
	; GFX9-NEXT: v_mov_b32_e32 v8, v30			; GFX9-NEXT: v_mov_b32_e32 v8, v30
	; GFX9-NEXT: buffer_load_dword v24, off, s[0:3], s33 offset:1564 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v24, off, s[0:3], s33 offset:1568 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v25, off, s[0:3], s33 offset:1560 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v25, off, s[0:3], s33 offset:1564 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v26, off, s[0:3], s33 offset:1556 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v26, off, s[0:3], s33 offset:1560 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v27, off, s[0:3], s33 offset:1552 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v27, off, s[0:3], s33 offset:1556 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v28, off, s[0:3], s33 offset:1548 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v28, off, s[0:3], s33 offset:1552 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v29, off, s[0:3], s33 offset:1544 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v29, off, s[0:3], s33 offset:1548 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v30, off, s[0:3], s33 offset:1540 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v30, off, s[0:3], s33 offset:1544 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:1540 ; 4-byte Folded Reload
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_add_u32_e32 v0, 0x400, v0			; GFX9-NEXT: v_add_u32_e32 v0, 0x400, v0
	; GFX9-NEXT: v_mov_b32_e32 v1, 42			; GFX9-NEXT: v_mov_b32_e32 v1, 42
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: buffer_load_dword v63, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v63, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v33, 1			; GFX9-NEXT: v_readlane_b32 s31, v33, 1
	; GFX9-NEXT: v_readlane_b32 s30, v33, 0			; GFX9-NEXT: v_readlane_b32 s30, v33, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:1568 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_add_i32 s32, s32, 0xfffd8000			; GFX9-NEXT: s_add_i32 s32, s32, 0xfffd8000
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_72xi32:			; GFX10-LABEL: call_72xi32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_add_i32 s33, s32, 0x3fe0			; GFX10-NEXT: s_add_i32 s33, s32, 0x3fe0
	; GFX10-NEXT: s_and_b32 s33, s33, 0xffffc000			; GFX10-NEXT: s_and_b32 s33, s33, 0xffffc000
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:1568 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:1536 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_add_i32 s32, s32, 0x14000			; GFX10-NEXT: s_add_i32 s32, s32, 0x14000
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_72xi32@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_72xi32@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_72xi32@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_72xi32@gotpcrel32@hi+12
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:776			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:776
	; GFX10-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:780			; GFX10-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:780
	; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:784			; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:784
	; GFX10-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:788			; GFX10-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:788
	; GFX10-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:792			; GFX10-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:792
	; GFX10-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:796			; GFX10-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:796
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:516			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:516
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1536 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1540 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:520			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:520
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1540 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1544 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:524			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:524
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1544 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1548 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:528			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:528
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1548 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1552 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:532			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:532
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1552 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1556 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:536			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:536
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1556 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1560 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:540			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:540
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1560 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1564 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:544			; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:544
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1564 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:1568 ; 4-byte Folded Spill
	; GFX10-NEXT: s_clause 0x15			; GFX10-NEXT: s_clause 0x15
	; GFX10-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:548			; GFX10-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:548
	; GFX10-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:552			; GFX10-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:552
	; GFX10-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:556			; GFX10-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:556
	; GFX10-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:560			; GFX10-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:560
	; GFX10-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:564			; GFX10-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:564
	; GFX10-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:568			; GFX10-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:568
	; GFX10-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:572			; GFX10-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:572
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:136			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:136
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:140			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:140
	; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:144			; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:144
	; GFX10-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:148			; GFX10-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:148
	; GFX10-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:152			; GFX10-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:152
	; GFX10-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:156			; GFX10-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:156
	; GFX10-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:160			; GFX10-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:160
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:1536			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:1540
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:1540			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:1544
	; GFX10-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:1544			; GFX10-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:1548
	; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:1548			; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:1552
	; GFX10-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:1552			; GFX10-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:1556
	; GFX10-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:1556			; GFX10-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:1560
	; GFX10-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:1560			; GFX10-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:1564
	; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:1564			; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:1568
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: v_mov_b32_e32 v1, 42			; GFX10-NEXT: v_mov_b32_e32 v1, 42
	; GFX10-NEXT: v_add_nc_u32_e32 v0, 0x400, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, 0x400, v0
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_clause 0xe			; GFX10-NEXT: s_clause 0xe
	; GFX10-NEXT: buffer_load_dword v63, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v63, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v62, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v62, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v61, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v61, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:12
	; GFX10-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:16
	; GFX10-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:20			; GFX10-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:20
	; GFX10-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:24			; GFX10-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:24
	; GFX10-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28			; GFX10-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28
	; GFX10-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32			; GFX10-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32
	; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36			; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40			; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:1568 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_add_i32 s32, s32, 0xfffec000			; GFX10-NEXT: s_add_i32 s32, s32, 0xfffec000
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_72xi32:			; GFX11-LABEL: call_72xi32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s45, s33			; GFX11-NEXT: s_mov_b32 s45, s33
	; GFX11-NEXT: s_add_i32 s33, s32, 0x1ff			; GFX11-NEXT: s_add_i32 s33, s32, 0x1ff
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s33, s33, 0xfffffe00			; GFX11-NEXT: s_and_b32 s33, s33, 0xfffffe00
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:1600 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:1536 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, 0			; GFX11-NEXT: v_mov_b32_e32 v4, 0
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_dual_mov_b32 v38, v53 :: v_dual_mov_b32 v37, v52			; GFX11-NEXT: v_dual_mov_b32 v38, v53 :: v_dual_mov_b32 v37, v52
	; GFX11-NEXT: s_waitcnt vmcnt(7)			; GFX11-NEXT: s_waitcnt vmcnt(7)
	; GFX11-NEXT: v_dual_mov_b32 v39, v54 :: v_dual_mov_b32 v52, v44			; GFX11-NEXT: v_dual_mov_b32 v39, v54 :: v_dual_mov_b32 v52, v44
	; GFX11-NEXT: s_waitcnt vmcnt(6)			; GFX11-NEXT: s_waitcnt vmcnt(6)
	; GFX11-NEXT: v_dual_mov_b32 v53, v56 :: v_dual_mov_b32 v54, v57			; GFX11-NEXT: v_dual_mov_b32 v53, v56 :: v_dual_mov_b32 v54, v57
	; GFX11-NEXT: s_waitcnt vmcnt(4)			; GFX11-NEXT: s_waitcnt vmcnt(4)
	; GFX11-NEXT: v_dual_mov_b32 v44, v62 :: v_dual_mov_b32 v57, v16			; GFX11-NEXT: v_dual_mov_b32 v44, v62 :: v_dual_mov_b32 v57, v16
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b128 off, v[12:15], s33 offset:1584 ; 16-byte Folded Spill			; GFX11-NEXT: scratch_store_b128 off, v[12:15], s33 offset:1588 ; 16-byte Folded Spill
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_load_b128 v[12:15], off, s33 offset:528			; GFX11-NEXT: scratch_load_b128 v[12:15], off, s33 offset:528
	; GFX11-NEXT: scratch_load_b128 v[20:23], off, s33 offset:544			; GFX11-NEXT: scratch_load_b128 v[20:23], off, s33 offset:544
	; GFX11-NEXT: scratch_load_b128 v[24:27], off, s33 offset:560			; GFX11-NEXT: scratch_load_b128 v[24:27], off, s33 offset:560
	; GFX11-NEXT: scratch_load_b128 v[28:31], off, s33 offset:576			; GFX11-NEXT: scratch_load_b128 v[28:31], off, s33 offset:576
	; GFX11-NEXT: v_mov_b32_e32 v56, v63			; GFX11-NEXT: v_mov_b32_e32 v56, v63
	; GFX11-NEXT: v_mov_b32_e32 v16, v19			; GFX11-NEXT: v_mov_b32_e32 v16, v19
	; GFX11-NEXT: v_dual_mov_b32 v18, v1 :: v_dual_mov_b32 v19, v2			; GFX11-NEXT: v_dual_mov_b32 v18, v1 :: v_dual_mov_b32 v19, v2
	; GFX11-NEXT: v_dual_mov_b32 v1, v4 :: v_dual_mov_b32 v2, v5			; GFX11-NEXT: v_dual_mov_b32 v1, v4 :: v_dual_mov_b32 v2, v5
	; GFX11-NEXT: v_dual_mov_b32 v4, v7 :: v_dual_mov_b32 v5, v8			; GFX11-NEXT: v_dual_mov_b32 v4, v7 :: v_dual_mov_b32 v5, v8
	; GFX11-NEXT: s_waitcnt vmcnt(3)			; GFX11-NEXT: s_waitcnt vmcnt(3)
	; GFX11-NEXT: v_dual_mov_b32 v7, v10 :: v_dual_mov_b32 v8, v15			; GFX11-NEXT: v_dual_mov_b32 v7, v10 :: v_dual_mov_b32 v8, v15
	; GFX11-NEXT: s_waitcnt vmcnt(1)			; GFX11-NEXT: s_waitcnt vmcnt(1)
	; GFX11-NEXT: v_dual_mov_b32 v10, v21 :: v_dual_mov_b32 v15, v26			; GFX11-NEXT: v_dual_mov_b32 v10, v21 :: v_dual_mov_b32 v15, v26
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b128 off, v[28:31], s33 offset:1568 ; 16-byte Folded Spill			; GFX11-NEXT: scratch_store_b128 off, v[28:31], s33 offset:1572 ; 16-byte Folded Spill
	; GFX11-NEXT: scratch_load_b128 v[28:31], off, s33 offset:592			; GFX11-NEXT: scratch_load_b128 v[28:31], off, s33 offset:592
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b128 off, v[28:31], s33 offset:1552 ; 16-byte Folded Spill			; GFX11-NEXT: scratch_store_b128 off, v[28:31], s33 offset:1556 ; 16-byte Folded Spill
	; GFX11-NEXT: scratch_load_b128 v[28:31], off, s33 offset:608			; GFX11-NEXT: scratch_load_b128 v[28:31], off, s33 offset:608
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b128 off, v[28:31], s33 offset:1536 ; 16-byte Folded Spill			; GFX11-NEXT: scratch_store_b128 off, v[28:31], s33 offset:1540 ; 16-byte Folded Spill
	; GFX11-NEXT: scratch_store_b128 off, v[32:35], s32			; GFX11-NEXT: scratch_store_b128 off, v[32:35], s32
	; GFX11-NEXT: v_mov_b32_e32 v32, v36			; GFX11-NEXT: v_mov_b32_e32 v32, v36
	; GFX11-NEXT: v_dual_mov_b32 v33, v48 :: v_dual_mov_b32 v34, v49			; GFX11-NEXT: v_dual_mov_b32 v33, v48 :: v_dual_mov_b32 v34, v49
	; GFX11-NEXT: v_dual_mov_b32 v35, v50 :: v_dual_mov_b32 v36, v51			; GFX11-NEXT: v_dual_mov_b32 v35, v50 :: v_dual_mov_b32 v36, v51
	; GFX11-NEXT: v_dual_mov_b32 v48, v55 :: v_dual_mov_b32 v49, v41			; GFX11-NEXT: v_dual_mov_b32 v48, v55 :: v_dual_mov_b32 v49, v41
	; GFX11-NEXT: v_mov_b32_e32 v50, v42			; GFX11-NEXT: v_mov_b32_e32 v50, v42
	; GFX11-NEXT: v_dual_mov_b32 v55, v58 :: v_dual_mov_b32 v58, v17			; GFX11-NEXT: v_dual_mov_b32 v55, v58 :: v_dual_mov_b32 v58, v17
	; GFX11-NEXT: v_dual_mov_b32 v17, v0 :: v_dual_mov_b32 v0, v3			; GFX11-NEXT: v_dual_mov_b32 v17, v0 :: v_dual_mov_b32 v0, v3
	Show All 26 Lines
	; GFX11-NEXT: s_add_i32 s0, s32, 48			; GFX11-NEXT: s_add_i32 s0, s32, 48
	; GFX11-NEXT: v_mov_b32_e32 v16, v27			; GFX11-NEXT: v_mov_b32_e32 v16, v27
	; GFX11-NEXT: scratch_store_b128 off, v[48:51], s0			; GFX11-NEXT: scratch_store_b128 off, v[48:51], s0
	; GFX11-NEXT: s_add_i32 s0, s32, 32			; GFX11-NEXT: s_add_i32 s0, s32, 32
	; GFX11-NEXT: v_mov_b32_e32 v30, v46			; GFX11-NEXT: v_mov_b32_e32 v30, v46
	; GFX11-NEXT: scratch_store_b128 off, v[36:39], s0			; GFX11-NEXT: scratch_store_b128 off, v[36:39], s0
	; GFX11-NEXT: s_add_i32 s0, s32, 16			; GFX11-NEXT: s_add_i32 s0, s32, 16
	; GFX11-NEXT: scratch_store_b128 off, v[32:35], s0			; GFX11-NEXT: scratch_store_b128 off, v[32:35], s0
	; GFX11-NEXT: scratch_load_b128 v[1:4], off, s33 offset:1584 ; 16-byte Folded Reload			; GFX11-NEXT: scratch_load_b128 v[1:4], off, s33 offset:1588 ; 16-byte Folded Reload
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v1, 42			; GFX11-NEXT: v_mov_b32_e32 v1, 42
	; GFX11-NEXT: s_clause 0x2			; GFX11-NEXT: s_clause 0x2
	; GFX11-NEXT: scratch_load_b128 v[17:20], off, s33 offset:1568			; GFX11-NEXT: scratch_load_b128 v[17:20], off, s33 offset:1572
	; GFX11-NEXT: scratch_load_b128 v[21:24], off, s33 offset:1552			; GFX11-NEXT: scratch_load_b128 v[21:24], off, s33 offset:1556
	; GFX11-NEXT: scratch_load_b128 v[25:28], off, s33 offset:1536			; GFX11-NEXT: scratch_load_b128 v[25:28], off, s33 offset:1540
	; GFX11-NEXT: s_add_i32 s0, s33, 0x400			; GFX11-NEXT: s_add_i32 s0, s33, 0x400
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_mov_b32_e32 v0, s0			; GFX11-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[46:47]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[46:47]
	; GFX11-NEXT: s_clause 0xe			; GFX11-NEXT: s_clause 0xe
	; GFX11-NEXT: scratch_load_b32 v63, off, s33			; GFX11-NEXT: scratch_load_b32 v63, off, s33
	; GFX11-NEXT: scratch_load_b32 v62, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v62, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v61, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v61, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v60, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v60, off, s33 offset:12
	; GFX11-NEXT: scratch_load_b32 v59, off, s33 offset:16			; GFX11-NEXT: scratch_load_b32 v59, off, s33 offset:16
	; GFX11-NEXT: scratch_load_b32 v58, off, s33 offset:20			; GFX11-NEXT: scratch_load_b32 v58, off, s33 offset:20
	; GFX11-NEXT: scratch_load_b32 v57, off, s33 offset:24			; GFX11-NEXT: scratch_load_b32 v57, off, s33 offset:24
	; GFX11-NEXT: scratch_load_b32 v56, off, s33 offset:28			; GFX11-NEXT: scratch_load_b32 v56, off, s33 offset:28
	; GFX11-NEXT: scratch_load_b32 v47, off, s33 offset:32			; GFX11-NEXT: scratch_load_b32 v47, off, s33 offset:32
	; GFX11-NEXT: scratch_load_b32 v46, off, s33 offset:36			; GFX11-NEXT: scratch_load_b32 v46, off, s33 offset:36
	; GFX11-NEXT: scratch_load_b32 v45, off, s33 offset:40			; GFX11-NEXT: scratch_load_b32 v45, off, s33 offset:40
	; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:44			; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:44
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:48			; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:48
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:52			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:52
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:56			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:56
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:1600 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:1536 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xf600			; GFX11-NEXT: s_addk_i32 s32, 0xf600
	; GFX11-NEXT: s_mov_b32 s33, s45			; GFX11-NEXT: s_mov_b32 s33, s45
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%ret.0 = call amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> zeroinitializer)			%ret.0 = call amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> zeroinitializer)
	%val.0 = insertelement <72 x i32> %ret.0, i32 42, i32 0			%val.0 = insertelement <72 x i32> %ret.0, i32 42, i32 0
	Show All 10 Lines

llvm/test/CodeGen/AMDGPU/indirect-call.ll

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines
define void @test_indirect_call_vgpr_ptr(ptr %fptr) {		define void @test_indirect_call_vgpr_ptr(ptr %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr:		; GCN-LABEL: test_indirect_call_vgpr_ptr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v40, s16, 18
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s39, v40, 7		; GCN-NEXT: v_readlane_b32 s39, v40, 7
; GCN-NEXT: v_readlane_b32 s38, v40, 6		; GCN-NEXT: v_readlane_b32 s38, v40, 6
; GCN-NEXT: v_readlane_b32 s37, v40, 5		; GCN-NEXT: v_readlane_b32 s37, v40, 5
; GCN-NEXT: v_readlane_b32 s36, v40, 4		; GCN-NEXT: v_readlane_b32 s36, v40, 4
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: v_readlane_b32 s4, v41, 0		; GCN-NEXT: v_readlane_b32 s4, v40, 18
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s4		; GCN-NEXT: s_mov_b32 s33, s4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr:		; GISEL-LABEL: test_indirect_call_vgpr_ptr:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v40, s16, 18
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s39, v40, 7		; GISEL-NEXT: v_readlane_b32 s39, v40, 7
; GISEL-NEXT: v_readlane_b32 s38, v40, 6		; GISEL-NEXT: v_readlane_b32 s38, v40, 6
; GISEL-NEXT: v_readlane_b32 s37, v40, 5		; GISEL-NEXT: v_readlane_b32 s37, v40, 5
; GISEL-NEXT: v_readlane_b32 s36, v40, 4		; GISEL-NEXT: v_readlane_b32 s36, v40, 4
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: v_readlane_b32 s4, v41, 0		; GISEL-NEXT: v_readlane_b32 s4, v40, 18
; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1		; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[6:7]		; GISEL-NEXT: s_mov_b64 exec, s[6:7]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s4		; GISEL-NEXT: s_mov_b32 s33, s4
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
call void %fptr()		call void %fptr()
ret void		ret void
}		}

define void @test_indirect_call_vgpr_ptr_arg(ptr %fptr) {		define void @test_indirect_call_vgpr_ptr_arg(ptr %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v40, s16, 18
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s39, v40, 7		; GCN-NEXT: v_readlane_b32 s39, v40, 7
; GCN-NEXT: v_readlane_b32 s38, v40, 6		; GCN-NEXT: v_readlane_b32 s38, v40, 6
; GCN-NEXT: v_readlane_b32 s37, v40, 5		; GCN-NEXT: v_readlane_b32 s37, v40, 5
; GCN-NEXT: v_readlane_b32 s36, v40, 4		; GCN-NEXT: v_readlane_b32 s36, v40, 4
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: v_readlane_b32 s4, v41, 0		; GCN-NEXT: v_readlane_b32 s4, v40, 18
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s4		; GCN-NEXT: s_mov_b32 s33, s4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v40, s16, 18
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s39, v40, 7		; GISEL-NEXT: v_readlane_b32 s39, v40, 7
; GISEL-NEXT: v_readlane_b32 s38, v40, 6		; GISEL-NEXT: v_readlane_b32 s38, v40, 6
; GISEL-NEXT: v_readlane_b32 s37, v40, 5		; GISEL-NEXT: v_readlane_b32 s37, v40, 5
; GISEL-NEXT: v_readlane_b32 s36, v40, 4		; GISEL-NEXT: v_readlane_b32 s36, v40, 4
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: v_readlane_b32 s4, v41, 0		; GISEL-NEXT: v_readlane_b32 s4, v40, 18
; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1		; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[6:7]		; GISEL-NEXT: s_mov_b64 exec, s[6:7]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s4		; GISEL-NEXT: s_mov_b32 s33, s4
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
call void %fptr(i32 123)		call void %fptr(i32 123)
ret void		ret void
}		}

define i32 @test_indirect_call_vgpr_ptr_ret(ptr %fptr) {		define i32 @test_indirect_call_vgpr_ptr_ret(ptr %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:		; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v40, s16, 18
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s39, v40, 7		; GCN-NEXT: v_readlane_b32 s39, v40, 7
; GCN-NEXT: v_readlane_b32 s38, v40, 6		; GCN-NEXT: v_readlane_b32 s38, v40, 6
; GCN-NEXT: v_readlane_b32 s37, v40, 5		; GCN-NEXT: v_readlane_b32 s37, v40, 5
; GCN-NEXT: v_readlane_b32 s36, v40, 4		; GCN-NEXT: v_readlane_b32 s36, v40, 4
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: v_readlane_b32 s4, v41, 0		; GCN-NEXT: v_readlane_b32 s4, v40, 18
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s4		; GCN-NEXT: s_mov_b32 s33, s4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v40, s16, 18
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s39, v40, 7		; GISEL-NEXT: v_readlane_b32 s39, v40, 7
; GISEL-NEXT: v_readlane_b32 s38, v40, 6		; GISEL-NEXT: v_readlane_b32 s38, v40, 6
; GISEL-NEXT: v_readlane_b32 s37, v40, 5		; GISEL-NEXT: v_readlane_b32 s37, v40, 5
; GISEL-NEXT: v_readlane_b32 s36, v40, 4		; GISEL-NEXT: v_readlane_b32 s36, v40, 4
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: v_readlane_b32 s4, v41, 0		; GISEL-NEXT: v_readlane_b32 s4, v40, 18
; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1		; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[6:7]		; GISEL-NEXT: s_mov_b64 exec, s[6:7]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s4		; GISEL-NEXT: s_mov_b32 s33, s4
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
%a = call i32 %fptr()		%a = call i32 %fptr()
%b = add i32 %a, 1		%b = add i32 %a, 1
ret i32 %b		ret i32 %b
}		}

define void @test_indirect_call_vgpr_ptr_in_branch(ptr %fptr, i1 %cond) {		define void @test_indirect_call_vgpr_ptr_in_branch(ptr %fptr, i1 %cond) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:		; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:
; GCN: ; %bb.0: ; %bb0		; GCN: ; %bb.0: ; %bb0
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v40, s16, 20
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s39, v40, 7		; GCN-NEXT: v_readlane_b32 s39, v40, 7
; GCN-NEXT: v_readlane_b32 s38, v40, 6		; GCN-NEXT: v_readlane_b32 s38, v40, 6
; GCN-NEXT: v_readlane_b32 s37, v40, 5		; GCN-NEXT: v_readlane_b32 s37, v40, 5
; GCN-NEXT: v_readlane_b32 s36, v40, 4		; GCN-NEXT: v_readlane_b32 s36, v40, 4
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: v_readlane_b32 s4, v41, 0		; GCN-NEXT: v_readlane_b32 s4, v40, 20
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s4		; GCN-NEXT: s_mov_b32 s33, s4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:
; GISEL: ; %bb.0: ; %bb0		; GISEL: ; %bb.0: ; %bb0
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v40, s16, 20
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s39, v40, 7		; GISEL-NEXT: v_readlane_b32 s39, v40, 7
; GISEL-NEXT: v_readlane_b32 s38, v40, 6		; GISEL-NEXT: v_readlane_b32 s38, v40, 6
; GISEL-NEXT: v_readlane_b32 s37, v40, 5		; GISEL-NEXT: v_readlane_b32 s37, v40, 5
; GISEL-NEXT: v_readlane_b32 s36, v40, 4		; GISEL-NEXT: v_readlane_b32 s36, v40, 4
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: v_readlane_b32 s4, v41, 0		; GISEL-NEXT: v_readlane_b32 s4, v40, 20
; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1		; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[6:7]		; GISEL-NEXT: s_mov_b64 exec, s[6:7]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s4		; GISEL-NEXT: s_mov_b32 s33, s4
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
bb0:		bb0:
br i1 %cond, label %bb1, label %bb2		br i1 %cond, label %bb1, label %bb2

▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	; GISEL-NEXT: s_setpc_b64 s[30:31]
call amdgpu_gfx void %fptr(i32 inreg 123)		call amdgpu_gfx void %fptr(i32 inreg 123)
ret void		ret void
}		}

define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, ptr %fptr) {		define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, ptr %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s12, s33		; GCN-NEXT: s_mov_b32 s10, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s12		; GCN-NEXT: s_mov_b32 s33, s10
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s12, s33		; GISEL-NEXT: s_mov_b32 s10, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s12		; GISEL-NEXT: s_mov_b32 s33, s10
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
call amdgpu_gfx void %fptr(i32 %i)		call amdgpu_gfx void %fptr(i32 %i)
ret i32 %i		ret i32 %i
}		}

; Use a variable inside a waterfall loop and use the return variable after the loop.		; Use a variable inside a waterfall loop and use the return variable after the loop.
; TODO The argument and return variable could be in the same physical register, but the register		; TODO The argument and return variable could be in the same physical register, but the register
; allocator is not able to do that because the return value clashes with the liverange of an		; allocator is not able to do that because the return value clashes with the liverange of an
; IMPLICIT_DEF of the argument.		; IMPLICIT_DEF of the argument.
define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, ptr %fptr) {		define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, ptr %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s12, s33		; GCN-NEXT: s_mov_b32 s10, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s12		; GCN-NEXT: s_mov_b32 s33, s10
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s12, s33		; GISEL-NEXT: s_mov_b32 s10, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s12		; GISEL-NEXT: s_mov_b32 s33, s10
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
%ret = call amdgpu_gfx i32 %fptr(i32 %i)		%ret = call amdgpu_gfx i32 %fptr(i32 %i)
ret i32 %ret		ret i32 %ret
}		}

; Calling a vgpr can never be a tail call.		; Calling a vgpr can never be a tail call.
define void @test_indirect_tail_call_vgpr_ptr(ptr %fptr) {		define void @test_indirect_tail_call_vgpr_ptr(ptr %fptr) {
; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:		; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s12, s33		; GCN-NEXT: s_mov_b32 s10, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s12		; GCN-NEXT: s_mov_b32 s33, s10
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:		; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s12, s33		; GISEL-NEXT: s_mov_b32 s10, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s12		; GISEL-NEXT: s_mov_b32 s33, s10
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
tail call amdgpu_gfx void %fptr()		tail call amdgpu_gfx void %fptr()
ret void		ret void
}		}

!llvm.module.flags = !{!0}		!llvm.module.flags = !{!0}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 200}		!0 = !{i32 1, !"amdgpu_code_object_version", i32 200}

llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs %s -o - \| FileCheck %s -check-prefix=GFX11			; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs %s -o - \| FileCheck %s -check-prefix=GFX11

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()

	define void @f0() {			define void @f0() {
	; GFX11-LABEL: f0:			; GFX11-LABEL: f0:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s3, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v4, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v4, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, f1@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, f1@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, f1@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, f1@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v4, s30, 0			; GFX11-NEXT: v_writelane_b32 v4, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v4, s31, 1			; GFX11-NEXT: v_writelane_b32 v4, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v4, 1			; GFX11-NEXT: v_readlane_b32 s31, v4, 1
	; GFX11-NEXT: v_readlane_b32 s30, v4, 0			; GFX11-NEXT: v_readlane_b32 s30, v4, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v4, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v4, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s3			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%i = call <2 x i64> @f1()			%i = call <2 x i64> @f1()
	ret void			ret void
	}			}

	define <2 x i64> @f1() #0 {			define <2 x i64> @f1() #0 {
	; GFX11-LABEL: f1:			; GFX11-LABEL: f1:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: v_mov_b32_e32 v2, 0			; GFX11-NEXT: v_mov_b32_e32 v2, 0
	; GFX11-NEXT: v_mov_b32_e32 v3, 0			; GFX11-NEXT: v_mov_b32_e32 v3, 0
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	ret <2 x i64> zeroinitializer			ret <2 x i64> zeroinitializer
	}			}

	; FIXME: This generates "instid1(/* invalid instid value */)".			; FIXME: This generates "instid1(/* invalid instid value */)".
	define amdgpu_kernel void @f2(i32 %arg, i32 %arg1, i32 %arg2, i1 %arg3, i32 %arg4, i1 %arg5, ptr %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10, i1 %arg11) {			define amdgpu_kernel void @f2(i32 %arg, i32 %arg1, i32 %arg2, i1 %arg3, i32 %arg4, i1 %arg5, ptr %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10, i1 %arg11) {
	; GFX11-LABEL: f2:			; GFX11-LABEL: f2:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_mov_b64 s[16:17], s[4:5]			; GFX11-NEXT: s_mov_b64 s[16:17], s[4:5]
	; GFX11-NEXT: s_mov_b64 s[10:11], s[6:7]
	; GFX11-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GFX11-NEXT: s_load_b32 s2, s[16:17], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v31, v0			; GFX11-NEXT: v_mov_b32_e32 v31, v0
				; GFX11-NEXT: s_load_b32 s24, s[16:17], 0x24
	; GFX11-NEXT: s_mov_b32 s18, s14			; GFX11-NEXT: s_mov_b32 s18, s14
	; GFX11-NEXT: s_mov_b32 s12, s13			; GFX11-NEXT: s_mov_b32 s12, s13
	; GFX11-NEXT: s_mov_b64 s[4:5], s[0:1]			; GFX11-NEXT: s_mov_b64 s[10:11], s[6:7]
	; GFX11-NEXT: s_mov_b32 s20, 0
	; GFX11-NEXT: v_and_b32_e32 v0, 0x3ff, v31			; GFX11-NEXT: v_and_b32_e32 v0, 0x3ff, v31
				; GFX11-NEXT: s_mov_b64 s[6:7], s[2:3]
				; GFX11-NEXT: s_mov_b64 s[4:5], s[0:1]
				; GFX11-NEXT: s_mov_b32 s3, 0
	; GFX11-NEXT: s_mov_b32 s0, -1			; GFX11-NEXT: s_mov_b32 s0, -1
	; GFX11-NEXT: s_mov_b32 s19, exec_lo			; GFX11-NEXT: s_mov_b32 s19, exec_lo
	; GFX11-NEXT: s_mov_b32 s32, 0			; GFX11-NEXT: s_mov_b32 s32, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_mul_lo_u32 v0, s2, v0			; GFX11-NEXT: v_mul_lo_u32 v0, s24, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX11-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX11-NEXT: s_cbranch_execz .LBB2_13			; GFX11-NEXT: s_cbranch_execz .LBB2_13
	; GFX11-NEXT: ; %bb.1: ; %bb14			; GFX11-NEXT: ; %bb.1: ; %bb14
	; GFX11-NEXT: s_load_b128 s[20:23], s[16:17], 0x2c			; GFX11-NEXT: s_load_b128 s[20:23], s[16:17], 0x2c
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_bitcmp1_b32 s21, 0			; GFX11-NEXT: s_bitcmp1_b32 s21, 0
	; GFX11-NEXT: s_cselect_b32 s24, -1, 0			; GFX11-NEXT: s_cselect_b32 s25, -1, 0
	; GFX11-NEXT: s_bitcmp0_b32 s21, 0			; GFX11-NEXT: s_bitcmp0_b32 s21, 0
	; GFX11-NEXT: s_mov_b32 s21, 0			; GFX11-NEXT: s_mov_b32 s21, 0
	; GFX11-NEXT: s_cbranch_scc0 .LBB2_3			; GFX11-NEXT: s_cbranch_scc0 .LBB2_3
	; GFX11-NEXT: ; %bb.2: ; %bb15			; GFX11-NEXT: ; %bb.2: ; %bb15
	; GFX11-NEXT: s_add_u32 s8, s16, 0x58			; GFX11-NEXT: s_add_u32 s8, s16, 0x58
	; GFX11-NEXT: s_addc_u32 s9, s17, 0			; GFX11-NEXT: s_addc_u32 s9, s17, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, f0@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, f0@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, f0@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, f0@gotpcrel32@hi+12
	; GFX11-NEXT: s_mov_b32 s13, s18			; GFX11-NEXT: s_mov_b32 s13, s18
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_mov_b32 s14, s15			; GFX11-NEXT: s_mov_b32 s14, s15
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_mov_b32 s1, -1			; GFX11-NEXT: s_mov_b32 s1, -1
	; GFX11-NEXT: s_cbranch_execz .LBB2_4			; GFX11-NEXT: s_and_not1_b32 vcc_lo, exec_lo, s3
				; GFX11-NEXT: s_cbranch_vccz .LBB2_4
	; GFX11-NEXT: s_branch .LBB2_12			; GFX11-NEXT: s_branch .LBB2_12
	; GFX11-NEXT: .LBB2_3:			; GFX11-NEXT: .LBB2_3:
	; GFX11-NEXT: s_mov_b32 s1, 0			; GFX11-NEXT: s_mov_b32 s1, 0
	; GFX11-NEXT: .LBB2_4: ; %bb16			; GFX11-NEXT: .LBB2_4: ; %bb16
	; GFX11-NEXT: s_load_b32 s3, s[16:17], 0x54			; GFX11-NEXT: s_load_b32 s2, s[16:17], 0x54
	; GFX11-NEXT: s_bitcmp1_b32 s23, 0			; GFX11-NEXT: s_bitcmp1_b32 s23, 0
	; GFX11-NEXT: s_cselect_b32 s0, -1, 0			; GFX11-NEXT: s_cselect_b32 s0, -1, 0
	; GFX11-NEXT: s_and_b32 s9, s23, 1			; GFX11-NEXT: s_and_b32 s3, s23, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_bitcmp1_b32 s3, 0			; GFX11-NEXT: s_bitcmp1_b32 s2, 0
	; GFX11-NEXT: s_mov_b32 s3, -1			; GFX11-NEXT: s_mov_b32 s2, -1
	; GFX11-NEXT: s_cselect_b32 s8, -1, 0			; GFX11-NEXT: s_cselect_b32 s8, -1, 0
	; GFX11-NEXT: s_cmp_eq_u32 s9, 0			; GFX11-NEXT: s_cmp_eq_u32 s3, 0
	; GFX11-NEXT: s_cbranch_scc0 .LBB2_8			; GFX11-NEXT: s_cbranch_scc0 .LBB2_8
	; GFX11-NEXT: ; %bb.5: ; %bb18.preheader			; GFX11-NEXT: ; %bb.5: ; %bb18.preheader
	; GFX11-NEXT: s_load_b128 s[28:31], s[16:17], 0x44			; GFX11-NEXT: s_load_b128 s[28:31], s[16:17], 0x44
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_mul_hi_u32 s3, s29, s28			; GFX11-NEXT: s_mul_hi_u32 s2, s29, s28
	; GFX11-NEXT: s_mul_i32 s9, s29, s28			; GFX11-NEXT: s_mul_i32 s3, s29, s28
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_alignbit_b32 v0, s3, s9, 1			; GFX11-NEXT: v_alignbit_b32 v0, s2, s3, 1
	; GFX11-NEXT: v_readfirstlane_b32 s3, v0
	; GFX11-NEXT: v_cndmask_b32_e64 v0, 0, 1, s24
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_or_b32 s3, s3, 1
	; GFX11-NEXT: s_lshr_b32 s3, s3, s30
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_2) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_mul_i32 s9, s3, s22
	; GFX11-NEXT: s_mov_b32 s3, 0			; GFX11-NEXT: s_mov_b32 s3, 0
	; GFX11-NEXT: s_mul_i32 s9, s9, s20			; GFX11-NEXT: v_readfirstlane_b32 s2, v0
	; GFX11-NEXT: s_or_b32 s2, s2, s9			; GFX11-NEXT: v_cndmask_b32_e64 v0, 0, 1, s25
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
				; GFX11-NEXT: s_or_b32 s2, s2, 1
				; GFX11-NEXT: s_lshr_b32 s2, s2, s30
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
				; GFX11-NEXT: s_mul_i32 s2, s2, s22
				; GFX11-NEXT: s_mul_i32 s2, s2, s20
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
				; GFX11-NEXT: s_or_b32 s2, s24, s2
	; GFX11-NEXT: s_lshl_b64 s[22:23], s[2:3], 1			; GFX11-NEXT: s_lshl_b64 s[22:23], s[2:3], 1
	; GFX11-NEXT: global_load_u16 v2, v1, s[22:23]			; GFX11-NEXT: global_load_u16 v2, v1, s[22:23]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_cmp_eq_u16_e32 vcc_lo, 0, v2			; GFX11-NEXT: v_cmp_eq_u16_e32 vcc_lo, 0, v2
	; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo
	; GFX11-NEXT: .p2align 6			; GFX11-NEXT: .p2align 6
	; GFX11-NEXT: .LBB2_6: ; %bb18			; GFX11-NEXT: .LBB2_6: ; %bb18
	; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1			; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1
	Show All 11 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_and_b32_e32 v1, 1, v1			; GFX11-NEXT: v_and_b32_e32 v1, 1, v1
	; GFX11-NEXT: s_bitcmp1_b32 s2, 0			; GFX11-NEXT: s_bitcmp1_b32 s2, 0
	; GFX11-NEXT: s_cselect_b32 s2, 0x100, 0			; GFX11-NEXT: s_cselect_b32 s2, 0x100, 0
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_or_b32 s3, s2, s3			; GFX11-NEXT: s_or_b32 s3, s2, s3
	; GFX11-NEXT: s_cbranch_vccz .LBB2_6			; GFX11-NEXT: s_cbranch_vccz .LBB2_6
	; GFX11-NEXT: ; %bb.7: ; %Flow			; GFX11-NEXT: ; %bb.7: ; %Flow
	; GFX11-NEXT: s_mov_b32 s3, 0			; GFX11-NEXT: s_mov_b32 s2, 0
	; GFX11-NEXT: .LBB2_8: ; %Flow12			; GFX11-NEXT: .LBB2_8: ; %Flow12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 vcc_lo, exec_lo, s3			; GFX11-NEXT: s_and_b32 vcc_lo, exec_lo, s2
	; GFX11-NEXT: s_cbranch_vccz .LBB2_12			; GFX11-NEXT: s_cbranch_vccz .LBB2_12
	; GFX11-NEXT: ; %bb.9:			; GFX11-NEXT: ; %bb.9:
	; GFX11-NEXT: s_xor_b32 s0, s8, -1			; GFX11-NEXT: s_xor_b32 s0, s8, -1
	; GFX11-NEXT: .LBB2_10: ; %bb17			; GFX11-NEXT: .LBB2_10: ; %bb17
	; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1			; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 vcc_lo, exec_lo, s0			; GFX11-NEXT: s_and_b32 vcc_lo, exec_lo, s0
	; GFX11-NEXT: s_cbranch_vccz .LBB2_10			; GFX11-NEXT: s_cbranch_vccz .LBB2_10
	; GFX11-NEXT: ; %bb.11: ; %Flow6			; GFX11-NEXT: ; %bb.11: ; %Flow6
	; GFX11-NEXT: s_mov_b32 s21, -1			; GFX11-NEXT: s_mov_b32 s21, -1
	; GFX11-NEXT: .LBB2_12: ; %Flow11			; GFX11-NEXT: .LBB2_12: ; %Flow11
	; GFX11-NEXT: s_and_b32 s20, s1, exec_lo			; GFX11-NEXT: s_and_b32 s3, s1, exec_lo
	; GFX11-NEXT: s_or_not1_b32 s0, s21, exec_lo			; GFX11-NEXT: s_or_not1_b32 s0, s21, exec_lo
	; GFX11-NEXT: .LBB2_13: ; %Flow9			; GFX11-NEXT: .LBB2_13: ; %Flow9
	; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s19			; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s19
	; GFX11-NEXT: s_and_saveexec_b32 s2, s0			; GFX11-NEXT: s_and_saveexec_b32 s19, s0
	; GFX11-NEXT: s_cbranch_execz .LBB2_15			; GFX11-NEXT: s_cbranch_execz .LBB2_15
	; GFX11-NEXT: ; %bb.14: ; %bb43			; GFX11-NEXT: ; %bb.14: ; %bb43
	; GFX11-NEXT: s_add_u32 s8, s16, 0x58			; GFX11-NEXT: s_add_u32 s8, s16, 0x58
	; GFX11-NEXT: s_addc_u32 s9, s17, 0			; GFX11-NEXT: s_addc_u32 s9, s17, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, f0@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, f0@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, f0@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, f0@gotpcrel32@hi+12
	; GFX11-NEXT: s_mov_b32 s13, s18			; GFX11-NEXT: s_mov_b32 s13, s18
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_mov_b32 s14, s15			; GFX11-NEXT: s_mov_b32 s14, s15
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_or_b32 s20, s20, exec_lo			; GFX11-NEXT: s_or_b32 s3, s3, exec_lo
	; GFX11-NEXT: .LBB2_15: ; %Flow14			; GFX11-NEXT: .LBB2_15: ; %Flow14
	; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s19
	; GFX11-NEXT: s_and_saveexec_b32 s0, s20			; GFX11-NEXT: s_and_saveexec_b32 s0, s3
	; GFX11-NEXT: ; %bb.16: ; %UnifiedUnreachableBlock			; GFX11-NEXT: ; %bb.16: ; %UnifiedUnreachableBlock
	; GFX11-NEXT: ; divergent unreachable			; GFX11-NEXT: ; divergent unreachable
	; GFX11-NEXT: ; %bb.17: ; %UnifiedReturnBlock			; GFX11-NEXT: ; %bb.17: ; %UnifiedReturnBlock
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	bb:			bb:
	%i = tail call i32 @llvm.amdgcn.workitem.id.x()			%i = tail call i32 @llvm.amdgcn.workitem.id.x()
	%i12 = mul i32 %arg, %i			%i12 = mul i32 %arg, %i
	%i13 = icmp ult i32 %i12, 1			%i13 = icmp ult i32 %i12, 1
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 -verify-machineinstrs %s -o - \| FileCheck %s

	; The forced spill to preserve the scratch VGPR require the voffset to hold the large offset			; The forced spill to preserve the scratch VGPR require the voffset to hold the large offset
	; value in the MUBUF instruction being emitted before s_cbranch_scc1 as it clobbers the SCC.			; value in the MUBUF instruction being emitted before s_cbranch_scc1 as it clobbers the SCC.

	define amdgpu_kernel void @test_kernel(i32 %val) #0 {			define amdgpu_kernel void @test_kernel(i32 %val) #0 {
	; CHECK-LABEL: test_kernel:			; CHECK-LABEL: test_kernel:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_mov_b32 s32, 0x180000			; CHECK-NEXT: s_mov_b32 s32, 0x180000
	; CHECK-NEXT: s_mov_b32 s33, 0			; CHECK-NEXT: s_mov_b32 s33, 0
	; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; CHECK-NEXT: s_add_u32 s0, s0, s17			; CHECK-NEXT: s_add_u32 s0, s0, s17
	; CHECK-NEXT: s_addc_u32 s1, s1, 0			; CHECK-NEXT: s_addc_u32 s1, s1, 0
	; CHECK-NEXT: v_writelane_b32 v40, s16, 0			; CHECK-NEXT: ; implicit-def: $vgpr3
				; CHECK-NEXT: v_writelane_b32 v3, s16, 0
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s12, s33, 0x100200
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s12 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: s_mov_b32 s13, s15			; CHECK-NEXT: s_mov_b32 s13, s15
	; CHECK-NEXT: s_mov_b32 s12, s14			; CHECK-NEXT: s_mov_b32 s12, s14
	; CHECK-NEXT: v_readlane_b32 s14, v40, 0			; CHECK-NEXT: v_readlane_b32 s14, v3, 0
	; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9]			; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9]
	; CHECK-NEXT: v_mov_b32_e32 v3, v2			; CHECK-NEXT: v_mov_b32_e32 v3, v2
	; CHECK-NEXT: v_mov_b32_e32 v2, v1			; CHECK-NEXT: v_mov_b32_e32 v2, v1
	; CHECK-NEXT: v_mov_b32_e32 v1, v0			; CHECK-NEXT: v_mov_b32_e32 v1, v0
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s8, s33, 0x100200
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s8 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: s_load_dword s8, s[16:17], 0x0			; CHECK-NEXT: s_load_dword s8, s[16:17], 0x0
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_writelane_b32 v40, s8, 1			; CHECK-NEXT: v_writelane_b32 v0, s8, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s8, s33, 0x100200
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s8 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def vgpr10			; CHECK-NEXT: ; def vgpr10
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: s_add_i32 s8, s33, 0x100100			; CHECK-NEXT: s_add_i32 s8, s33, 0x100100
	; CHECK-NEXT: buffer_store_dword v10, off, s[0:3], s8 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v10, off, s[0:3], s8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 s[18:19], 8			; CHECK-NEXT: s_mov_b64 s[18:19], 8
	; CHECK-NEXT: s_mov_b32 s8, s16			; CHECK-NEXT: s_mov_b32 s8, s16
	; CHECK-NEXT: s_mov_b32 s9, s17			; CHECK-NEXT: s_mov_b32 s9, s17
	Show All 16 Lines
	; CHECK-NEXT: s_mov_b32 s15, 10			; CHECK-NEXT: s_mov_b32 s15, 10
	; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2			; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2
	; CHECK-NEXT: v_or3_b32 v31, v1, v2, v3			; CHECK-NEXT: v_or3_b32 v31, v1, v2, v3
	; CHECK-NEXT: ; implicit-def: $sgpr15			; CHECK-NEXT: ; implicit-def: $sgpr15
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s4, s33, 0x100200
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: s_add_i32 s4, s33, 0x100100			; CHECK-NEXT: s_add_i32 s4, s33, 0x100100
	; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s4 ; 4-byte Folded Reload
	; CHECK-NEXT: v_readlane_b32 s4, v40, 1			; CHECK-NEXT: s_waitcnt vmcnt(1)
				; CHECK-NEXT: v_readlane_b32 s4, v0, 1
	; CHECK-NEXT: s_mov_b32 s5, 0			; CHECK-NEXT: s_mov_b32 s5, 0
	; CHECK-NEXT: s_cmp_eq_u32 s4, s5			; CHECK-NEXT: s_cmp_eq_u32 s4, s5
	; CHECK-NEXT: v_mov_b32_e32 v0, 0x4000			; CHECK-NEXT: v_mov_b32_e32 v0, 0x4000
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v10, v0, s[0:3], s33 offen ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v10, v0, s[0:3], s33 offen ; 4-byte Folded Spill
	; CHECK-NEXT: s_cbranch_scc1 .LBB0_2			; CHECK-NEXT: s_cbranch_scc1 .LBB0_2
	; CHECK-NEXT: ; %bb.1: ; %store			; CHECK-NEXT: ; %bb.1: ; %store
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s4, s33, 0x100200
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: s_add_i32 s4, s33, 0x100000			; CHECK-NEXT: s_add_i32 s4, s33, 0x100000
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s4 ; 4-byte Folded Reload
	; CHECK-NEXT: ; implicit-def: $sgpr4			; CHECK-NEXT: ; implicit-def: $sgpr4
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: v_mov_b32_e32 v1, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: ds_write_b32 v0, v1			; CHECK-NEXT: ds_write_b32 v1, v2
				; CHECK-NEXT: ; kill: killed $vgpr0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	; CHECK-NEXT: .LBB0_2: ; %end			; CHECK-NEXT: .LBB0_2: ; %end
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s4, s33, 0x100200
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
				; CHECK-NEXT: ; kill: killed $vgpr0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%arr = alloca < 1339 x i32>, align 8192, addrspace(5)			%arr = alloca < 1339 x i32>, align 8192, addrspace(5)
	%cmp = icmp ne i32 %val, 0			%cmp = icmp ne i32 %val, 0
	%vreg = call i32 asm sideeffect "; def vgpr10", "={v10}"()			%vreg = call i32 asm sideeffect "; def vgpr10", "={v10}"()
	call void @device_func(ptr addrspace(5) %arr)			call void @device_func(ptr addrspace(5) %arr)
	br i1 %cmp, label %store, label %end			br i1 %cmp, label %store, label %end

	store:			store:
	Show All 10 Lines

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands-non-ptr-intrinsics.ll

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; GFX1100_W64-NEXT: s_waitcnt vmcnt(0)			; GFX1100_W64-NEXT: s_waitcnt vmcnt(0)
	; GFX1100_W64-NEXT: v_mov_b32_e32 v0, v5			; GFX1100_W64-NEXT: v_mov_b32_e32 v0, v5
	; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]			; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
	;			;
	; W64-O0-LABEL: mubuf_vgpr:			; W64-O0-LABEL: mubuf_vgpr:
	; W64-O0: ; %bb.0:			; W64-O0: ; %bb.0:
	; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; W64-O0-NEXT: ; implicit-def: $vgpr5
	; W64-O0-NEXT: v_mov_b32_e32 v4, v3			; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; W64-O0-NEXT: v_mov_b32_e32 v5, v3
	; W64-O0-NEXT: v_mov_b32_e32 v6, v2			; W64-O0-NEXT: v_mov_b32_e32 v6, v2
	; W64-O0-NEXT: v_mov_b32_e32 v7, v1			; W64-O0-NEXT: v_mov_b32_e32 v7, v1
				; W64-O0-NEXT: v_mov_b32_e32 v1, v0
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2_vgpr3_vgpr4 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v7			; W64-O0-NEXT: v_mov_b32_e32 v2, v7
	; W64-O0-NEXT: v_mov_b32_e32 v2, v6			; W64-O0-NEXT: v_mov_b32_e32 v3, v6
	; W64-O0-NEXT: v_mov_b32_e32 v3, v4			; W64-O0-NEXT: v_mov_b32_e32 v4, v5
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7
	; W64-O0-NEXT: s_mov_b32 s4, 0			; W64-O0-NEXT: s_mov_b32 s4, 0
	; W64-O0-NEXT: v_writelane_b32 v5, s4, 0			; W64-O0-NEXT: v_writelane_b32 v0, s4, 0
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v5, s4, 1			; W64-O0-NEXT: v_writelane_b32 v0, s4, 1
	; W64-O0-NEXT: v_writelane_b32 v5, s5, 2			; W64-O0-NEXT: v_writelane_b32 v0, s5, 2
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v5, s8, 3			; W64-O0-NEXT: v_writelane_b32 v0, s8, 3
	; W64-O0-NEXT: v_writelane_b32 v5, s9, 4			; W64-O0-NEXT: v_writelane_b32 v0, s9, 4
	; W64-O0-NEXT: v_writelane_b32 v5, s10, 5			; W64-O0-NEXT: v_writelane_b32 v0, s10, 5
	; W64-O0-NEXT: v_writelane_b32 v5, s11, 6			; W64-O0-NEXT: v_writelane_b32 v0, s11, 6
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v5, s4, 7			; W64-O0-NEXT: v_writelane_b32 v0, s4, 7
	; W64-O0-NEXT: v_writelane_b32 v5, s5, 8			; W64-O0-NEXT: v_writelane_b32 v0, s5, 8
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1			; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v5, 7			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v5, 8			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v5, 3			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v5, 4			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v5, 5			; W64-O0-NEXT: v_readlane_b32 s4, v1, 7
	; W64-O0-NEXT: v_readlane_b32 s11, v5, 6			; W64-O0-NEXT: v_readlane_b32 s5, v1, 8
	; W64-O0-NEXT: v_readlane_b32 s6, v5, 0			; W64-O0-NEXT: v_readlane_b32 s8, v1, 3
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 4
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 5
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 6
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 0
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB0_1			; W64-O0-NEXT: s_cbranch_execnz .LBB0_1
	; W64-O0-NEXT: ; %bb.3:			; W64-O0-NEXT: ; %bb.3:
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s4, v5, 1			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s5, v5, 2			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_waitcnt vmcnt(1)
				; W64-O0-NEXT: v_readlane_b32 s4, v1, 1
				; W64-O0-NEXT: v_readlane_b32 s5, v1, 2
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
				; W64-O0-NEXT: ; kill: killed $vgpr1
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_nop 0
				; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: s_setpc_b64 s[30:31]			; W64-O0-NEXT: s_setpc_b64 s[30:31]
	%call = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %i, i32 %c, i32 0, i32 0, i32 0) #1			%call = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %i, i32 %c, i32 0, i32 0, i32 0) #1
	ret float %call			ret float %call
	}			}


	▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
	; GFX1100_W64-NEXT: global_store_b32 v[11:12], v0, off dlc			; GFX1100_W64-NEXT: global_store_b32 v[11:12], v0, off dlc
	; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]			; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
	;			;
	; W64-O0-LABEL: mubuf_vgpr_adjacent_in_block:			; W64-O0-LABEL: mubuf_vgpr_adjacent_in_block:
	; W64-O0: ; %bb.0: ; %entry			; W64-O0: ; %bb.0: ; %entry
	; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill			; W64-O0-NEXT: ; implicit-def: $vgpr13
	; W64-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; W64-O0-NEXT: v_mov_b32_e32 v13, v4
	; W64-O0-NEXT: v_mov_b32_e32 v14, v4
	; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_mov_b32_e32 v6, v3			; W64-O0-NEXT: v_mov_b32_e32 v7, v3
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload			; W64-O0-NEXT: v_mov_b32_e32 v8, v2
	; W64-O0-NEXT: v_mov_b32_e32 v7, v2			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_mov_b32_e32 v8, v1			; W64-O0-NEXT: v_mov_b32_e32 v9, v1
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_mov_b32_e32 v2, v0			; W64-O0-NEXT: v_mov_b32_e32 v3, v0
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr14 killed $vgpr14 def $vgpr14_vgpr15_vgpr16_vgpr17 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr13 killed $vgpr13 def $vgpr13_vgpr14_vgpr15_vgpr16 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v15, v5			; W64-O0-NEXT: v_mov_b32_e32 v14, v5
				; W64-O0-NEXT: v_mov_b32_e32 v15, v6
	; W64-O0-NEXT: s_waitcnt vmcnt(3)			; W64-O0-NEXT: s_waitcnt vmcnt(3)
	; W64-O0-NEXT: v_mov_b32_e32 v16, v4			; W64-O0-NEXT: v_mov_b32_e32 v16, v4
	; W64-O0-NEXT: s_waitcnt vmcnt(2)			; W64-O0-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; W64-O0-NEXT: v_mov_b32_e32 v17, v3
	; W64-O0-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v16, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v17, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v16, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3_vgpr4_vgpr5 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr3 killed $vgpr3 def $vgpr3_vgpr4_vgpr5_vgpr6 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v3, v8			; W64-O0-NEXT: v_mov_b32_e32 v4, v9
	; W64-O0-NEXT: v_mov_b32_e32 v4, v7			; W64-O0-NEXT: v_mov_b32_e32 v5, v8
	; W64-O0-NEXT: v_mov_b32_e32 v5, v6			; W64-O0-NEXT: v_mov_b32_e32 v6, v7
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v2, v12			; W64-O0-NEXT: v_mov_b32_e32 v3, v12
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v10			; W64-O0-NEXT: v_mov_b32_e32 v2, v10
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7
	; W64-O0-NEXT: s_mov_b32 s4, 0			; W64-O0-NEXT: s_mov_b32 s4, 0
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 0			; W64-O0-NEXT: v_writelane_b32 v0, s4, 0
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 1			; W64-O0-NEXT: v_writelane_b32 v0, s4, 1
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 2			; W64-O0-NEXT: v_writelane_b32 v0, s5, 2
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB1_1: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB1_1: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v13, s8, 3			; W64-O0-NEXT: v_writelane_b32 v0, s8, 3
	; W64-O0-NEXT: v_writelane_b32 v13, s9, 4			; W64-O0-NEXT: v_writelane_b32 v0, s9, 4
	; W64-O0-NEXT: v_writelane_b32 v13, s10, 5			; W64-O0-NEXT: v_writelane_b32 v0, s10, 5
	; W64-O0-NEXT: v_writelane_b32 v13, s11, 6			; W64-O0-NEXT: v_writelane_b32 v0, s11, 6
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 7			; W64-O0-NEXT: v_writelane_b32 v0, s4, 7
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 8			; W64-O0-NEXT: v_writelane_b32 v0, s5, 8
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB1_1 Depth=1			; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB1_1 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 7			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 8			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v13, 3			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v13, 4			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v13, 5			; W64-O0-NEXT: v_readlane_b32 s4, v1, 7
	; W64-O0-NEXT: v_readlane_b32 s11, v13, 6			; W64-O0-NEXT: v_readlane_b32 s5, v1, 8
	; W64-O0-NEXT: v_readlane_b32 s6, v13, 0			; W64-O0-NEXT: v_readlane_b32 s8, v1, 3
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 4
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 5
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 6
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 0
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB1_1			; W64-O0-NEXT: s_cbranch_execnz .LBB1_1
	; W64-O0-NEXT: ; %bb.3:			; W64-O0-NEXT: ; %bb.3:
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 1			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 2			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: v_readlane_b32 s4, v0, 1
				; W64-O0-NEXT: v_readlane_b32 s5, v0, 2
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 9			; W64-O0-NEXT: v_writelane_b32 v0, s4, 9
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 10			; W64-O0-NEXT: v_writelane_b32 v0, s5, 10
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB1_4: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB1_4: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v13, s8, 11			; W64-O0-NEXT: v_writelane_b32 v0, s8, 11
	; W64-O0-NEXT: v_writelane_b32 v13, s9, 12			; W64-O0-NEXT: v_writelane_b32 v0, s9, 12
	; W64-O0-NEXT: v_writelane_b32 v13, s10, 13			; W64-O0-NEXT: v_writelane_b32 v0, s10, 13
	; W64-O0-NEXT: v_writelane_b32 v13, s11, 14			; W64-O0-NEXT: v_writelane_b32 v0, s11, 14
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 15			; W64-O0-NEXT: v_writelane_b32 v0, s4, 15
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 16			; W64-O0-NEXT: v_writelane_b32 v0, s5, 16
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.5: ; in Loop: Header=BB1_4 Depth=1			; W64-O0-NEXT: ; %bb.5: ; in Loop: Header=BB1_4 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 15			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 16			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v13, 11			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v13, 12			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v13, 13			; W64-O0-NEXT: v_readlane_b32 s4, v1, 15
	; W64-O0-NEXT: v_readlane_b32 s11, v13, 14			; W64-O0-NEXT: v_readlane_b32 s5, v1, 16
	; W64-O0-NEXT: v_readlane_b32 s6, v13, 0			; W64-O0-NEXT: v_readlane_b32 s8, v1, 11
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 12
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 13
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 14
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 0
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB1_4			; W64-O0-NEXT: s_cbranch_execnz .LBB1_4
	; W64-O0-NEXT: ; %bb.6:			; W64-O0-NEXT: ; %bb.6:
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
				; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 9			; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 10			; W64-O0-NEXT: s_waitcnt vmcnt(6)
				; W64-O0-NEXT: v_readlane_b32 s4, v0, 9
				; W64-O0-NEXT: v_readlane_b32 s5, v0, 10
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: global_store_dword v[3:4], v5, off			; W64-O0-NEXT: global_store_dword v[4:5], v6, off
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: global_store_dword v[0:1], v2, off			; W64-O0-NEXT: global_store_dword v[1:2], v3, off
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: ; kill: killed $vgpr0
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_load_dword v13, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: s_setpc_b64 s[30:31]			; W64-O0-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%val0 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %i, i32 %c, i32 0, i32 0, i32 0) #1			%val0 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %i, i32 %c, i32 0, i32 0, i32 0) #1
	%val1 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %j, i32 %c, i32 0, i32 0, i32 0) #1			%val1 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %j, i32 %c, i32 0, i32 0, i32 0) #1
	store volatile float %val0, ptr addrspace(1) %out0			store volatile float %val0, ptr addrspace(1) %out0
	store volatile float %val1, ptr addrspace(1) %out1			store volatile float %val1, ptr addrspace(1) %out1
	▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
	; GFX1100_W64-NEXT: global_store_b32 v[11:12], v9, off dlc			; GFX1100_W64-NEXT: global_store_b32 v[11:12], v9, off dlc
	; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]			; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
	;			;
	; W64-O0-LABEL: mubuf_vgpr_outside_entry:			; W64-O0-LABEL: mubuf_vgpr_outside_entry:
	; W64-O0: ; %bb.0: ; %entry			; W64-O0: ; %bb.0: ; %entry
	; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: buffer_store_dword v31, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill			; W64-O0-NEXT: ; implicit-def: $vgpr8
	; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v31, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
	; W64-O0-NEXT: v_mov_b32_e32 v9, v7			; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
	; W64-O0-NEXT: v_mov_b32_e32 v10, v6			; W64-O0-NEXT: v_mov_b32_e32 v8, v6
	; W64-O0-NEXT: v_mov_b32_e32 v11, v5			; W64-O0-NEXT: v_mov_b32_e32 v9, v5
	; W64-O0-NEXT: v_mov_b32_e32 v5, v4			; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_nop 0			; W64-O0-NEXT: s_nop 0
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
	; W64-O0-NEXT: v_mov_b32_e32 v5, v3			; W64-O0-NEXT: v_mov_b32_e32 v10, v3
	; W64-O0-NEXT: v_mov_b32_e32 v6, v2			; W64-O0-NEXT: v_mov_b32_e32 v11, v2
	; W64-O0-NEXT: v_mov_b32_e32 v7, v1			; W64-O0-NEXT: v_mov_b32_e32 v13, v1
	; W64-O0-NEXT: v_mov_b32_e32 v13, v0			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload			; W64-O0-NEXT: v_mov_b32_e32 v6, v0
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2_vgpr3_vgpr4 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v11			; W64-O0-NEXT: v_mov_b32_e32 v2, v9
	; W64-O0-NEXT: v_mov_b32_e32 v2, v10			; W64-O0-NEXT: v_mov_b32_e32 v3, v8
	; W64-O0-NEXT: v_mov_b32_e32 v3, v9			; W64-O0-NEXT: v_mov_b32_e32 v4, v7
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr13 killed $vgpr13 def $vgpr13_vgpr14_vgpr15_vgpr16 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr6 killed $vgpr6 def $vgpr6_vgpr7_vgpr8_vgpr9 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v14, v7			; W64-O0-NEXT: v_mov_b32_e32 v7, v13
	; W64-O0-NEXT: v_mov_b32_e32 v15, v6			; W64-O0-NEXT: v_mov_b32_e32 v8, v11
	; W64-O0-NEXT: v_mov_b32_e32 v16, v5			; W64-O0-NEXT: v_mov_b32_e32 v9, v10
	; W64-O0-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v16, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr4 killed $vgpr4 def $vgpr4_vgpr5 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr5 killed $vgpr5 def $vgpr5_vgpr6 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v5, v12			; W64-O0-NEXT: v_mov_b32_e32 v6, v12
	; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7
	; W64-O0-NEXT: ;;#ASMSTART			; W64-O0-NEXT: ;;#ASMSTART
	; W64-O0-NEXT: s_mov_b32 s4, 17			; W64-O0-NEXT: s_mov_b32 s4, 17
	; W64-O0-NEXT: ;;#ASMEND			; W64-O0-NEXT: ;;#ASMEND
	; W64-O0-NEXT: s_mov_b32 s5, s4			; W64-O0-NEXT: s_mov_b32 s5, s4
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 0			; W64-O0-NEXT: v_writelane_b32 v0, s5, 0
	; W64-O0-NEXT: s_mov_b32 s5, 0			; W64-O0-NEXT: s_mov_b32 s5, 0
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 1			; W64-O0-NEXT: v_writelane_b32 v0, s5, 1
	; W64-O0-NEXT: v_mov_b32_e32 v0, s4			; W64-O0-NEXT: v_mov_b32_e32 v1, s4
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 2			; W64-O0-NEXT: v_writelane_b32 v0, s4, 2
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 3			; W64-O0-NEXT: v_writelane_b32 v0, s5, 3
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB2_1: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB2_1: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v8, s8, 4			; W64-O0-NEXT: v_writelane_b32 v0, s8, 4
	; W64-O0-NEXT: v_writelane_b32 v8, s9, 5			; W64-O0-NEXT: v_writelane_b32 v0, s9, 5
	; W64-O0-NEXT: v_writelane_b32 v8, s10, 6			; W64-O0-NEXT: v_writelane_b32 v0, s10, 6
	; W64-O0-NEXT: v_writelane_b32 v8, s11, 7			; W64-O0-NEXT: v_writelane_b32 v0, s11, 7
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 8			; W64-O0-NEXT: v_writelane_b32 v0, s4, 8
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 9			; W64-O0-NEXT: v_writelane_b32 v0, s5, 9
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB2_1 Depth=1			; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB2_1 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 8			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 9			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v8, 4			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v8, 5			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v8, 6			; W64-O0-NEXT: v_readlane_b32 s4, v1, 8
	; W64-O0-NEXT: v_readlane_b32 s11, v8, 7			; W64-O0-NEXT: v_readlane_b32 s5, v1, 9
	; W64-O0-NEXT: v_readlane_b32 s6, v8, 1			; W64-O0-NEXT: v_readlane_b32 s8, v1, 4
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 5
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 6
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 7
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 1
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB2_1			; W64-O0-NEXT: s_cbranch_execnz .LBB2_1
	; W64-O0-NEXT: ; %bb.3:			; W64-O0-NEXT: ; %bb.3:
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s6, v8, 2			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s7, v8, 3			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_waitcnt vmcnt(2)
				; W64-O0-NEXT: v_readlane_b32 s6, v0, 2
				; W64-O0-NEXT: v_readlane_b32 s7, v0, 3
	; W64-O0-NEXT: s_mov_b64 exec, s[6:7]			; W64-O0-NEXT: s_mov_b64 exec, s[6:7]
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 1			; W64-O0-NEXT: v_readlane_b32 s4, v0, 1
	; W64-O0-NEXT: s_mov_b32 s5, 0x3ff			; W64-O0-NEXT: s_mov_b32 s5, 0x3ff
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_and_b32_e64 v1, v1, s5			; W64-O0-NEXT: v_and_b32_e64 v2, v2, s5
	; W64-O0-NEXT: v_cmp_eq_u32_e64 s[6:7], v1, s4			; W64-O0-NEXT: v_cmp_eq_u32_e64 s[6:7], v2, s4
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 10			; W64-O0-NEXT: v_writelane_b32 v0, s4, 10
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 11			; W64-O0-NEXT: v_writelane_b32 v0, s5, 11
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execz .LBB2_8			; W64-O0-NEXT: s_cbranch_execz .LBB2_8
	; W64-O0-NEXT: ; %bb.4: ; %bb1			; W64-O0-NEXT: ; %bb.4: ; %bb1
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 0			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: v_readlane_b32 s4, v0, 0
	; W64-O0-NEXT: s_mov_b32 s5, 0			; W64-O0-NEXT: s_mov_b32 s5, 0
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 12			; W64-O0-NEXT: v_writelane_b32 v0, s5, 12
	; W64-O0-NEXT: v_mov_b32_e32 v0, s4			; W64-O0-NEXT: v_mov_b32_e32 v1, s4
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5_sgpr6_sgpr7
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 13			; W64-O0-NEXT: v_writelane_b32 v0, s4, 13
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 14			; W64-O0-NEXT: v_writelane_b32 v0, s5, 14
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB2_5: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB2_5: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v8, s8, 15			; W64-O0-NEXT: v_writelane_b32 v0, s8, 15
	; W64-O0-NEXT: v_writelane_b32 v8, s9, 16			; W64-O0-NEXT: v_writelane_b32 v0, s9, 16
	; W64-O0-NEXT: v_writelane_b32 v8, s10, 17			; W64-O0-NEXT: v_writelane_b32 v0, s10, 17
	; W64-O0-NEXT: v_writelane_b32 v8, s11, 18			; W64-O0-NEXT: v_writelane_b32 v0, s11, 18
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 19			; W64-O0-NEXT: v_writelane_b32 v0, s4, 19
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 20			; W64-O0-NEXT: v_writelane_b32 v0, s5, 20
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.6: ; in Loop: Header=BB2_5 Depth=1			; W64-O0-NEXT: ; %bb.6: ; in Loop: Header=BB2_5 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 19			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 20			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v8, 15			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v8, 16			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v8, 17			; W64-O0-NEXT: v_readlane_b32 s4, v1, 19
	; W64-O0-NEXT: v_readlane_b32 s11, v8, 18			; W64-O0-NEXT: v_readlane_b32 s5, v1, 20
	; W64-O0-NEXT: v_readlane_b32 s6, v8, 12			; W64-O0-NEXT: v_readlane_b32 s8, v1, 15
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 16
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 17
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 18
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 12
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB2_5			; W64-O0-NEXT: s_cbranch_execnz .LBB2_5
	; W64-O0-NEXT: ; %bb.7:			; W64-O0-NEXT: ; %bb.7:
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 13			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 14			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; W64-O0-NEXT: v_readlane_b32 s4, v1, 13
				; W64-O0-NEXT: v_readlane_b32 s5, v1, 14
				; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
	; W64-O0-NEXT: .LBB2_8: ; %bb2			; W64-O0-NEXT: .LBB2_8: ; %bb2
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: s_nop 0			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 10			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 11			; W64-O0-NEXT: s_waitcnt vmcnt(3)
				; W64-O0-NEXT: v_readlane_b32 s4, v0, 10
				; W64-O0-NEXT: v_readlane_b32 s5, v0, 11
	; W64-O0-NEXT: s_or_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_or_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: global_store_dword v[0:1], v2, off			; W64-O0-NEXT: global_store_dword v[1:2], v3, off
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: ; kill: killed $vgpr0
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: s_setpc_b64 s[30:31]			; W64-O0-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%live.out.reg = call i32 asm sideeffect "s_mov_b32 $0, 17", "={s4}" ()			%live.out.reg = call i32 asm sideeffect "s_mov_b32 $0, 17", "={s4}" ()
	%val0 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %i, i32 %live.out.reg, i32 0, i32 0, i32 0) #1			%val0 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %i, i32 %live.out.reg, i32 0, i32 0, i32 0) #1
	%idx = call i32 @llvm.amdgcn.workitem.id.x() #1			%idx = call i32 @llvm.amdgcn.workitem.id.x() #1
	%cmp = icmp eq i32 %idx, 0			%cmp = icmp eq i32 %idx, 0
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; GFX1100_W64-NEXT: s_waitcnt vmcnt(0)			; GFX1100_W64-NEXT: s_waitcnt vmcnt(0)
	; GFX1100_W64-NEXT: v_mov_b32_e32 v0, v5			; GFX1100_W64-NEXT: v_mov_b32_e32 v0, v5
	; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]			; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
	;			;
	; W64-O0-LABEL: mubuf_vgpr:			; W64-O0-LABEL: mubuf_vgpr:
	; W64-O0: ; %bb.0:			; W64-O0: ; %bb.0:
	; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; W64-O0-NEXT: ; implicit-def: $vgpr5
				; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; W64-O0-NEXT: v_mov_b32_e32 v6, v2			; W64-O0-NEXT: v_mov_b32_e32 v6, v2
	; W64-O0-NEXT: v_mov_b32_e32 v2, v1			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
				; W64-O0-NEXT: v_mov_b32_e32 v3, v1
				; W64-O0-NEXT: v_mov_b32_e32 v1, v0
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr6 killed $vgpr6 def $vgpr6_vgpr7 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr6 killed $vgpr6 def $vgpr6_vgpr7 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v7, v3			; W64-O0-NEXT: s_waitcnt vmcnt(1)
	; W64-O0-NEXT: v_mov_b32_e32 v4, v7			; W64-O0-NEXT: v_mov_b32_e32 v7, v2
				; W64-O0-NEXT: v_mov_b32_e32 v5, v7
	; W64-O0-NEXT: ; kill: def $vgpr6 killed $vgpr6 killed $vgpr6_vgpr7 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr6 killed $vgpr6 killed $vgpr6_vgpr7 killed $exec
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v2			; W64-O0-NEXT: v_mov_b32_e32 v2, v3
	; W64-O0-NEXT: v_mov_b32_e32 v7, v1			; W64-O0-NEXT: v_mov_b32_e32 v7, v2
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $vgpr0_vgpr1 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 killed $vgpr1_vgpr2 killed $exec
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2_vgpr3_vgpr4 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v7			; W64-O0-NEXT: v_mov_b32_e32 v2, v7
	; W64-O0-NEXT: v_mov_b32_e32 v2, v6			; W64-O0-NEXT: v_mov_b32_e32 v3, v6
	; W64-O0-NEXT: v_mov_b32_e32 v3, v4			; W64-O0-NEXT: v_mov_b32_e32 v4, v5
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: s_mov_b32 s4, 0			; W64-O0-NEXT: s_mov_b32 s4, 0
	; W64-O0-NEXT: v_writelane_b32 v5, s4, 0			; W64-O0-NEXT: v_writelane_b32 v0, s4, 0
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v5, s4, 1			; W64-O0-NEXT: v_writelane_b32 v0, s4, 1
	; W64-O0-NEXT: v_writelane_b32 v5, s5, 2			; W64-O0-NEXT: v_writelane_b32 v0, s5, 2
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v5, s8, 3			; W64-O0-NEXT: v_writelane_b32 v0, s8, 3
	; W64-O0-NEXT: v_writelane_b32 v5, s9, 4			; W64-O0-NEXT: v_writelane_b32 v0, s9, 4
	; W64-O0-NEXT: v_writelane_b32 v5, s10, 5			; W64-O0-NEXT: v_writelane_b32 v0, s10, 5
	; W64-O0-NEXT: v_writelane_b32 v5, s11, 6			; W64-O0-NEXT: v_writelane_b32 v0, s11, 6
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v5, s4, 7			; W64-O0-NEXT: v_writelane_b32 v0, s4, 7
	; W64-O0-NEXT: v_writelane_b32 v5, s5, 8			; W64-O0-NEXT: v_writelane_b32 v0, s5, 8
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1			; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v5, 7			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v5, 8			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v5, 3			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v5, 4			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v5, 5			; W64-O0-NEXT: v_readlane_b32 s4, v1, 7
	; W64-O0-NEXT: v_readlane_b32 s11, v5, 6			; W64-O0-NEXT: v_readlane_b32 s5, v1, 8
	; W64-O0-NEXT: v_readlane_b32 s6, v5, 0			; W64-O0-NEXT: v_readlane_b32 s8, v1, 3
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 4
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 5
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 6
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 0
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB0_1			; W64-O0-NEXT: s_cbranch_execnz .LBB0_1
	; W64-O0-NEXT: ; %bb.3:			; W64-O0-NEXT: ; %bb.3:
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s4, v5, 1			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s5, v5, 2			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_waitcnt vmcnt(1)
				; W64-O0-NEXT: v_readlane_b32 s4, v1, 1
				; W64-O0-NEXT: v_readlane_b32 s5, v1, 2
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
				; W64-O0-NEXT: ; kill: killed $vgpr1
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_nop 0
				; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: s_setpc_b64 s[30:31]			; W64-O0-NEXT: s_setpc_b64 s[30:31]
	%call = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %i, i32 %c, i32 0, i32 0, i32 0) #1			%call = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %i, i32 %c, i32 0, i32 0, i32 0) #1
	ret float %call			ret float %call
	}			}


	▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
	; GFX1100_W64-NEXT: global_store_b32 v[11:12], v0, off dlc			; GFX1100_W64-NEXT: global_store_b32 v[11:12], v0, off dlc
	; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]			; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
	;			;
	; W64-O0-LABEL: mubuf_vgpr_adjacent_in_block:			; W64-O0-LABEL: mubuf_vgpr_adjacent_in_block:
	; W64-O0: ; %bb.0: ; %entry			; W64-O0: ; %bb.0: ; %entry
	; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill			; W64-O0-NEXT: ; implicit-def: $vgpr13
	; W64-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill			; W64-O0-NEXT: v_mov_b32_e32 v14, v6
	; W64-O0-NEXT: v_mov_b32_e32 v15, v6			; W64-O0-NEXT: v_mov_b32_e32 v9, v5
	; W64-O0-NEXT: v_mov_b32_e32 v8, v5			; W64-O0-NEXT: v_mov_b32_e32 v13, v4
	; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload			; W64-O0-NEXT: v_mov_b32_e32 v4, v3
	; W64-O0-NEXT: v_mov_b32_e32 v14, v4			; W64-O0-NEXT: v_mov_b32_e32 v8, v2
	; W64-O0-NEXT: v_mov_b32_e32 v7, v2			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_mov_b32_e32 v4, v1			; W64-O0-NEXT: v_mov_b32_e32 v5, v1
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_mov_b32_e32 v2, v0			; W64-O0-NEXT: v_mov_b32_e32 v3, v0
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr15 killed $vgpr15 def $vgpr15_vgpr16 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr14 killed $vgpr14 def $vgpr14_vgpr15 killed $exec
	; W64-O0-NEXT: s_waitcnt vmcnt(2)			; W64-O0-NEXT: v_mov_b32_e32 v15, v7
	; W64-O0-NEXT: v_mov_b32_e32 v16, v5
	; W64-O0-NEXT: v_mov_b32_e32 v5, v16
	; W64-O0-NEXT: v_mov_b32_e32 v6, v15			; W64-O0-NEXT: v_mov_b32_e32 v6, v15
				; W64-O0-NEXT: v_mov_b32_e32 v7, v14
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr14 killed $vgpr14 def $vgpr14_vgpr15 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr13 killed $vgpr13 def $vgpr13_vgpr14 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v15, v8			; W64-O0-NEXT: v_mov_b32_e32 v14, v9
	; W64-O0-NEXT: v_mov_b32_e32 v8, v15			; W64-O0-NEXT: v_mov_b32_e32 v9, v14
	; W64-O0-NEXT: ; kill: def $vgpr14 killed $vgpr14 killed $vgpr14_vgpr15 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr13 killed $vgpr13 killed $vgpr13_vgpr14 killed $exec
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr14 killed $vgpr14 def $vgpr14_vgpr15_vgpr16_vgpr17 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr13 killed $vgpr13 def $vgpr13_vgpr14_vgpr15_vgpr16 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v15, v8			; W64-O0-NEXT: v_mov_b32_e32 v14, v9
				; W64-O0-NEXT: v_mov_b32_e32 v15, v7
	; W64-O0-NEXT: v_mov_b32_e32 v16, v6			; W64-O0-NEXT: v_mov_b32_e32 v16, v6
	; W64-O0-NEXT: v_mov_b32_e32 v17, v5			; W64-O0-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v16, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v17, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v16, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr7 killed $vgpr7 def $vgpr7_vgpr8 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr8 killed $vgpr8 def $vgpr8_vgpr9 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v8, v3			; W64-O0-NEXT: v_mov_b32_e32 v9, v4
	; W64-O0-NEXT: v_mov_b32_e32 v6, v8			; W64-O0-NEXT: v_mov_b32_e32 v7, v9
	; W64-O0-NEXT: ; kill: def $vgpr7 killed $vgpr7 killed $vgpr7_vgpr8 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr8 killed $vgpr8 killed $vgpr8_vgpr9 killed $exec
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr3 killed $vgpr3 def $vgpr3_vgpr4 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v3, v4			; W64-O0-NEXT: v_mov_b32_e32 v4, v5
	; W64-O0-NEXT: v_mov_b32_e32 v8, v3			; W64-O0-NEXT: v_mov_b32_e32 v9, v4
	; W64-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 killed $vgpr2_vgpr3 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr3 killed $vgpr3 killed $vgpr3_vgpr4 killed $exec
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3_vgpr4_vgpr5 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr3 killed $vgpr3 def $vgpr3_vgpr4_vgpr5_vgpr6 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v3, v8			; W64-O0-NEXT: v_mov_b32_e32 v4, v9
	; W64-O0-NEXT: v_mov_b32_e32 v4, v7			; W64-O0-NEXT: v_mov_b32_e32 v5, v8
	; W64-O0-NEXT: v_mov_b32_e32 v5, v6			; W64-O0-NEXT: v_mov_b32_e32 v6, v7
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v2, v12			; W64-O0-NEXT: v_mov_b32_e32 v3, v12
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v10			; W64-O0-NEXT: v_mov_b32_e32 v2, v10
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: s_mov_b32 s4, 0			; W64-O0-NEXT: s_mov_b32 s4, 0
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 0			; W64-O0-NEXT: v_writelane_b32 v0, s4, 0
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 1			; W64-O0-NEXT: v_writelane_b32 v0, s4, 1
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 2			; W64-O0-NEXT: v_writelane_b32 v0, s5, 2
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB1_1: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB1_1: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v13, s8, 3			; W64-O0-NEXT: v_writelane_b32 v0, s8, 3
	; W64-O0-NEXT: v_writelane_b32 v13, s9, 4			; W64-O0-NEXT: v_writelane_b32 v0, s9, 4
	; W64-O0-NEXT: v_writelane_b32 v13, s10, 5			; W64-O0-NEXT: v_writelane_b32 v0, s10, 5
	; W64-O0-NEXT: v_writelane_b32 v13, s11, 6			; W64-O0-NEXT: v_writelane_b32 v0, s11, 6
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 7			; W64-O0-NEXT: v_writelane_b32 v0, s4, 7
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 8			; W64-O0-NEXT: v_writelane_b32 v0, s5, 8
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB1_1 Depth=1			; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB1_1 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 7			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 8			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v13, 3			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v13, 4			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v13, 5			; W64-O0-NEXT: v_readlane_b32 s4, v1, 7
	; W64-O0-NEXT: v_readlane_b32 s11, v13, 6			; W64-O0-NEXT: v_readlane_b32 s5, v1, 8
	; W64-O0-NEXT: v_readlane_b32 s6, v13, 0			; W64-O0-NEXT: v_readlane_b32 s8, v1, 3
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 4
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 5
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 6
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 0
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB1_1			; W64-O0-NEXT: s_cbranch_execnz .LBB1_1
	; W64-O0-NEXT: ; %bb.3:			; W64-O0-NEXT: ; %bb.3:
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 1			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 2			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: v_readlane_b32 s4, v0, 1
				; W64-O0-NEXT: v_readlane_b32 s5, v0, 2
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 9			; W64-O0-NEXT: v_writelane_b32 v0, s4, 9
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 10			; W64-O0-NEXT: v_writelane_b32 v0, s5, 10
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB1_4: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB1_4: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v13, s8, 11			; W64-O0-NEXT: v_writelane_b32 v0, s8, 11
	; W64-O0-NEXT: v_writelane_b32 v13, s9, 12			; W64-O0-NEXT: v_writelane_b32 v0, s9, 12
	; W64-O0-NEXT: v_writelane_b32 v13, s10, 13			; W64-O0-NEXT: v_writelane_b32 v0, s10, 13
	; W64-O0-NEXT: v_writelane_b32 v13, s11, 14			; W64-O0-NEXT: v_writelane_b32 v0, s11, 14
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v13, s4, 15			; W64-O0-NEXT: v_writelane_b32 v0, s4, 15
	; W64-O0-NEXT: v_writelane_b32 v13, s5, 16			; W64-O0-NEXT: v_writelane_b32 v0, s5, 16
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.5: ; in Loop: Header=BB1_4 Depth=1			; W64-O0-NEXT: ; %bb.5: ; in Loop: Header=BB1_4 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 15			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 16			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v13, 11			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v13, 12			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v13, 13			; W64-O0-NEXT: v_readlane_b32 s4, v1, 15
	; W64-O0-NEXT: v_readlane_b32 s11, v13, 14			; W64-O0-NEXT: v_readlane_b32 s5, v1, 16
	; W64-O0-NEXT: v_readlane_b32 s6, v13, 0			; W64-O0-NEXT: v_readlane_b32 s8, v1, 11
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 12
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 13
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 14
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 0
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB1_4			; W64-O0-NEXT: s_cbranch_execnz .LBB1_4
	; W64-O0-NEXT: ; %bb.6:			; W64-O0-NEXT: ; %bb.6:
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
				; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v13, 9			; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s5, v13, 10			; W64-O0-NEXT: s_waitcnt vmcnt(6)
				; W64-O0-NEXT: v_readlane_b32 s4, v0, 9
				; W64-O0-NEXT: v_readlane_b32 s5, v0, 10
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: global_store_dword v[3:4], v5, off			; W64-O0-NEXT: global_store_dword v[4:5], v6, off
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: global_store_dword v[0:1], v2, off			; W64-O0-NEXT: global_store_dword v[1:2], v3, off
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: ; kill: killed $vgpr0
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_load_dword v13, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: s_setpc_b64 s[30:31]			; W64-O0-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%val0 = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %i, i32 %c, i32 0, i32 0, i32 0) #1			%val0 = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %i, i32 %c, i32 0, i32 0, i32 0) #1
	%val1 = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %j, i32 %c, i32 0, i32 0, i32 0) #1			%val1 = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %j, i32 %c, i32 0, i32 0, i32 0) #1
	store volatile float %val0, ptr addrspace(1) %out0			store volatile float %val0, ptr addrspace(1) %out0
	store volatile float %val1, ptr addrspace(1) %out1			store volatile float %val1, ptr addrspace(1) %out1
	▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
	; GFX1100_W64-NEXT: global_store_b32 v[11:12], v9, off dlc			; GFX1100_W64-NEXT: global_store_b32 v[11:12], v9, off dlc
	; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1100_W64-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]			; GFX1100_W64-NEXT: s_setpc_b64 s[30:31]
	;			;
	; W64-O0-LABEL: mubuf_vgpr_outside_entry:			; W64-O0-LABEL: mubuf_vgpr_outside_entry:
	; W64-O0: ; %bb.0: ; %entry			; W64-O0: ; %bb.0: ; %entry
	; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:96 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:100 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:104 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: buffer_store_dword v31, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; W64-O0-NEXT: ; implicit-def: $vgpr8
	; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v31, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
	; W64-O0-NEXT: v_mov_b32_e32 v5, v4			; W64-O0-NEXT: v_mov_b32_e32 v6, v5
	; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_nop 0			; W64-O0-NEXT: s_nop 0
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
	; W64-O0-NEXT: v_mov_b32_e32 v10, v2			; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
				; W64-O0-NEXT: v_mov_b32_e32 v4, v3
				; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
				; W64-O0-NEXT: v_mov_b32_e32 v13, v2
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_mov_b32_e32 v6, v1			; W64-O0-NEXT: v_mov_b32_e32 v10, v1
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_mov_b32_e32 v9, v0			; W64-O0-NEXT: v_mov_b32_e32 v8, v0
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr10 killed $vgpr10 def $vgpr10_vgpr11 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr13 killed $vgpr13 def $vgpr13_vgpr14 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v11, v3			; W64-O0-NEXT: v_mov_b32_e32 v14, v4
	; W64-O0-NEXT: v_mov_b32_e32 v3, v11			; W64-O0-NEXT: v_mov_b32_e32 v4, v14
	; W64-O0-NEXT: v_mov_b32_e32 v5, v10			; W64-O0-NEXT: v_mov_b32_e32 v6, v13
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr9 killed $vgpr9 def $vgpr9_vgpr10 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr8 killed $vgpr8 def $vgpr8_vgpr9 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v10, v6			; W64-O0-NEXT: v_mov_b32_e32 v9, v10
	; W64-O0-NEXT: v_mov_b32_e32 v6, v10
	; W64-O0-NEXT: v_mov_b32_e32 v13, v9			; W64-O0-NEXT: v_mov_b32_e32 v13, v9
				; W64-O0-NEXT: ; kill: def $vgpr8 killed $vgpr8 killed $vgpr8_vgpr9 killed $exec
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr13 killed $vgpr13 def $vgpr13_vgpr14_vgpr15_vgpr16 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr8 killed $vgpr8 def $vgpr8_vgpr9_vgpr10_vgpr11 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v14, v6			; W64-O0-NEXT: v_mov_b32_e32 v9, v13
	; W64-O0-NEXT: v_mov_b32_e32 v15, v5			; W64-O0-NEXT: v_mov_b32_e32 v10, v6
	; W64-O0-NEXT: v_mov_b32_e32 v16, v3			; W64-O0-NEXT: v_mov_b32_e32 v11, v4
	; W64-O0-NEXT: buffer_store_dword v13, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v14, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v10, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v16, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v11, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr4 killed $vgpr4 def $vgpr4_vgpr5 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr5 killed $vgpr5 def $vgpr5_vgpr6 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v5, v7			; W64-O0-NEXT: v_mov_b32_e32 v6, v7
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr3 killed $vgpr3 def $vgpr3_vgpr4 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v3, v1			; W64-O0-NEXT: v_mov_b32_e32 v4, v2
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; implicit-def: $sgpr4			; W64-O0-NEXT: ; implicit-def: $sgpr4
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v12			; W64-O0-NEXT: v_mov_b32_e32 v2, v12
	; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; W64-O0-NEXT: ;;#ASMSTART			; W64-O0-NEXT: ;;#ASMSTART
	; W64-O0-NEXT: s_mov_b32 s4, 17			; W64-O0-NEXT: s_mov_b32 s4, 17
	; W64-O0-NEXT: ;;#ASMEND			; W64-O0-NEXT: ;;#ASMEND
	; W64-O0-NEXT: s_mov_b32 s5, s4			; W64-O0-NEXT: s_mov_b32 s5, s4
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 0			; W64-O0-NEXT: v_writelane_b32 v0, s5, 0
	; W64-O0-NEXT: s_mov_b32 s5, 0			; W64-O0-NEXT: s_mov_b32 s5, 0
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 1			; W64-O0-NEXT: v_writelane_b32 v0, s5, 1
	; W64-O0-NEXT: v_mov_b32_e32 v0, s4			; W64-O0-NEXT: v_mov_b32_e32 v1, s4
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 2			; W64-O0-NEXT: v_writelane_b32 v0, s4, 2
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 3			; W64-O0-NEXT: v_writelane_b32 v0, s5, 3
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB2_1: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB2_1: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v8, s8, 4			; W64-O0-NEXT: v_writelane_b32 v0, s8, 4
	; W64-O0-NEXT: v_writelane_b32 v8, s9, 5			; W64-O0-NEXT: v_writelane_b32 v0, s9, 5
	; W64-O0-NEXT: v_writelane_b32 v8, s10, 6			; W64-O0-NEXT: v_writelane_b32 v0, s10, 6
	; W64-O0-NEXT: v_writelane_b32 v8, s11, 7			; W64-O0-NEXT: v_writelane_b32 v0, s11, 7
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 8			; W64-O0-NEXT: v_writelane_b32 v0, s4, 8
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 9			; W64-O0-NEXT: v_writelane_b32 v0, s5, 9
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB2_1 Depth=1			; W64-O0-NEXT: ; %bb.2: ; in Loop: Header=BB2_1 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 8			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 9			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v8, 4			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v8, 5			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v8, 6			; W64-O0-NEXT: v_readlane_b32 s4, v1, 8
	; W64-O0-NEXT: v_readlane_b32 s11, v8, 7			; W64-O0-NEXT: v_readlane_b32 s5, v1, 9
	; W64-O0-NEXT: v_readlane_b32 s6, v8, 1			; W64-O0-NEXT: v_readlane_b32 s8, v1, 4
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 5
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 6
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 7
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 1
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB2_1			; W64-O0-NEXT: s_cbranch_execnz .LBB2_1
	; W64-O0-NEXT: ; %bb.3:			; W64-O0-NEXT: ; %bb.3:
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s6, v8, 2			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s7, v8, 3			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_waitcnt vmcnt(2)
				; W64-O0-NEXT: v_readlane_b32 s6, v0, 2
				; W64-O0-NEXT: v_readlane_b32 s7, v0, 3
	; W64-O0-NEXT: s_mov_b64 exec, s[6:7]			; W64-O0-NEXT: s_mov_b64 exec, s[6:7]
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 1			; W64-O0-NEXT: v_readlane_b32 s4, v0, 1
	; W64-O0-NEXT: s_mov_b32 s5, 0x3ff			; W64-O0-NEXT: s_mov_b32 s5, 0x3ff
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_and_b32_e64 v1, v1, s5			; W64-O0-NEXT: v_and_b32_e64 v2, v2, s5
	; W64-O0-NEXT: v_cmp_eq_u32_e64 s[6:7], v1, s4			; W64-O0-NEXT: v_cmp_eq_u32_e64 s[6:7], v2, s4
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 10			; W64-O0-NEXT: v_writelane_b32 v0, s4, 10
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 11			; W64-O0-NEXT: v_writelane_b32 v0, s5, 11
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execz .LBB2_8			; W64-O0-NEXT: s_cbranch_execz .LBB2_8
	; W64-O0-NEXT: ; %bb.4: ; %bb1			; W64-O0-NEXT: ; %bb.4: ; %bb1
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 0			; W64-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(4)
	; W64-O0-NEXT: v_mov_b32_e32 v6, v4			; W64-O0-NEXT: v_readlane_b32 s4, v0, 0
	; W64-O0-NEXT: v_mov_b32_e32 v0, v3			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_mov_b32_e32 v4, v2			; W64-O0-NEXT: v_mov_b32_e32 v7, v5
	; W64-O0-NEXT: v_mov_b32_e32 v5, v1			; W64-O0-NEXT: v_mov_b32_e32 v1, v4
				; W64-O0-NEXT: v_mov_b32_e32 v5, v3
				; W64-O0-NEXT: v_mov_b32_e32 v6, v2
	; W64-O0-NEXT: ; implicit-def: $sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr5
	; W64-O0-NEXT: ; implicit-def: $sgpr5			; W64-O0-NEXT: ; implicit-def: $sgpr5
	; W64-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3 killed $exec			; W64-O0-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2_vgpr3_vgpr4 killed $exec
	; W64-O0-NEXT: v_mov_b32_e32 v1, v6			; W64-O0-NEXT: v_mov_b32_e32 v2, v7
	; W64-O0-NEXT: v_mov_b32_e32 v2, v5			; W64-O0-NEXT: v_mov_b32_e32 v3, v6
	; W64-O0-NEXT: v_mov_b32_e32 v3, v4			; W64-O0-NEXT: v_mov_b32_e32 v4, v5
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:84 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:84 ; 4-byte Folded Spill
	; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:88 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:88 ; 4-byte Folded Spill
				; W64-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:92 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b32 s5, 0			; W64-O0-NEXT: s_mov_b32 s5, 0
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 12			; W64-O0-NEXT: v_writelane_b32 v0, s5, 12
	; W64-O0-NEXT: v_mov_b32_e32 v0, s4			; W64-O0-NEXT: v_mov_b32_e32 v1, s4
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_mov_b64 s[4:5], exec			; W64-O0-NEXT: s_mov_b64 s[4:5], exec
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 13			; W64-O0-NEXT: v_writelane_b32 v0, s4, 13
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 14			; W64-O0-NEXT: v_writelane_b32 v0, s5, 14
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: .LBB2_5: ; =>This Inner Loop Header: Depth=1			; W64-O0-NEXT: .LBB2_5: ; =>This Inner Loop Header: Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:84 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:84 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:88 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:88 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:92 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readfirstlane_b32 s8, v0			; W64-O0-NEXT: v_readfirstlane_b32 s8, v1
	; W64-O0-NEXT: v_readfirstlane_b32 s12, v1			; W64-O0-NEXT: v_readfirstlane_b32 s12, v2
	; W64-O0-NEXT: s_mov_b32 s4, s8			; W64-O0-NEXT: s_mov_b32 s4, s8
	; W64-O0-NEXT: s_mov_b32 s5, s12			; W64-O0-NEXT: s_mov_b32 s5, s12
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[0:1]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[4:5], s[4:5], v[1:2]
	; W64-O0-NEXT: v_readfirstlane_b32 s7, v2			; W64-O0-NEXT: v_readfirstlane_b32 s7, v3
	; W64-O0-NEXT: v_readfirstlane_b32 s6, v3			; W64-O0-NEXT: v_readfirstlane_b32 s6, v4
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[2:3]			; W64-O0-NEXT: v_cmp_eq_u64_e64 s[10:11], s[10:11], v[3:4]
	; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]			; W64-O0-NEXT: s_and_b64 s[4:5], s[4:5], s[10:11]
	; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; W64-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; W64-O0-NEXT: s_mov_b32 s9, s12			; W64-O0-NEXT: s_mov_b32 s9, s12
	; W64-O0-NEXT: s_mov_b32 s10, s7			; W64-O0-NEXT: s_mov_b32 s10, s7
	; W64-O0-NEXT: s_mov_b32 s11, s6			; W64-O0-NEXT: s_mov_b32 s11, s6
	; W64-O0-NEXT: v_writelane_b32 v8, s8, 15			; W64-O0-NEXT: v_writelane_b32 v0, s8, 15
	; W64-O0-NEXT: v_writelane_b32 v8, s9, 16			; W64-O0-NEXT: v_writelane_b32 v0, s9, 16
	; W64-O0-NEXT: v_writelane_b32 v8, s10, 17			; W64-O0-NEXT: v_writelane_b32 v0, s10, 17
	; W64-O0-NEXT: v_writelane_b32 v8, s11, 18			; W64-O0-NEXT: v_writelane_b32 v0, s11, 18
	; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]			; W64-O0-NEXT: s_and_saveexec_b64 s[4:5], s[4:5]
	; W64-O0-NEXT: v_writelane_b32 v8, s4, 19			; W64-O0-NEXT: v_writelane_b32 v0, s4, 19
	; W64-O0-NEXT: v_writelane_b32 v8, s5, 20			; W64-O0-NEXT: v_writelane_b32 v0, s5, 20
				; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: ; %bb.6: ; in Loop: Header=BB2_5 Depth=1			; W64-O0-NEXT: ; %bb.6: ; in Loop: Header=BB2_5 Depth=1
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 19			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 20			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s8, v8, 15			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: v_readlane_b32 s9, v8, 16			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: v_readlane_b32 s10, v8, 17			; W64-O0-NEXT: v_readlane_b32 s4, v1, 19
	; W64-O0-NEXT: v_readlane_b32 s11, v8, 18			; W64-O0-NEXT: v_readlane_b32 s5, v1, 20
	; W64-O0-NEXT: v_readlane_b32 s6, v8, 12			; W64-O0-NEXT: v_readlane_b32 s8, v1, 15
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: v_readlane_b32 s9, v1, 16
	; W64-O0-NEXT: s_nop 3			; W64-O0-NEXT: v_readlane_b32 s10, v1, 17
				; W64-O0-NEXT: v_readlane_b32 s11, v1, 18
				; W64-O0-NEXT: v_readlane_b32 s6, v1, 12
				; W64-O0-NEXT: s_nop 4
	; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen			; W64-O0-NEXT: buffer_load_format_x v0, v0, s[8:11], s6 idxen
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:92 ; 4-byte Folded Spill			; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:96 ; 4-byte Folded Spill
	; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_xor_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_cbranch_execnz .LBB2_5			; W64-O0-NEXT: s_cbranch_execnz .LBB2_5
	; W64-O0-NEXT: ; %bb.7:			; W64-O0-NEXT: ; %bb.7:
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:92 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:96 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 13			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 14			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; W64-O0-NEXT: v_readlane_b32 s4, v1, 13
				; W64-O0-NEXT: v_readlane_b32 s5, v1, 14
				; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
				; W64-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
	; W64-O0-NEXT: .LBB2_8: ; %bb2			; W64-O0-NEXT: .LBB2_8: ; %bb2
	; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0-NEXT: s_or_saveexec_b64 s[16:17], -1
	; W64-O0-NEXT: s_nop 0			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; W64-O0-NEXT: s_mov_b64 exec, s[16:17]
	; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s4, v8, 10			; W64-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
	; W64-O0-NEXT: v_readlane_b32 s5, v8, 11			; W64-O0-NEXT: s_waitcnt vmcnt(3)
				; W64-O0-NEXT: v_readlane_b32 s4, v0, 10
				; W64-O0-NEXT: v_readlane_b32 s5, v0, 11
	; W64-O0-NEXT: s_or_b64 exec, exec, s[4:5]			; W64-O0-NEXT: s_or_b64 exec, exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: global_store_dword v[0:1], v2, off			; W64-O0-NEXT: global_store_dword v[1:2], v3, off
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
				; W64-O0-NEXT: ; kill: killed $vgpr0
	; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; W64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; W64-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:96 ; 4-byte Folded Reload			; W64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:100 ; 4-byte Folded Reload
				; W64-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:104 ; 4-byte Folded Reload
	; W64-O0-NEXT: s_mov_b64 exec, s[4:5]			; W64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; W64-O0-NEXT: s_waitcnt vmcnt(0)			; W64-O0-NEXT: s_waitcnt vmcnt(0)
	; W64-O0-NEXT: s_setpc_b64 s[30:31]			; W64-O0-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%live.out.reg = call i32 asm sideeffect "s_mov_b32 $0, 17", "={s4}" ()			%live.out.reg = call i32 asm sideeffect "s_mov_b32 $0, 17", "={s4}" ()
	%val0 = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %i, i32 %live.out.reg, i32 0, i32 0, i32 0) #1			%val0 = call float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr addrspace(8) %i, i32 %live.out.reg, i32 0, i32 0, i32 0) #1
	%idx = call i32 @llvm.amdgcn.workitem.id.x() #1			%idx = call i32 @llvm.amdgcn.workitem.id.x() #1
	%cmp = icmp eq i32 %idx, 0			%cmp = icmp eq i32 %idx, 0
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

	Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
	define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {			define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {
	; GFX9-LABEL: slsr1_1:			; GFX9-LABEL: slsr1_1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
				; GFX9-NEXT: v_writelane_b32 v40, s4, 5
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s34, 2			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: v_writelane_b32 v44, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s36, 3			; GFX9-NEXT: v_writelane_b32 v40, s36, 3
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s37, 4			; GFX9-NEXT: v_writelane_b32 v40, s37, 4
	; GFX9-NEXT: s_load_dwordx2 s[36:37], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[36:37], s[4:5], 0x0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	Show All 15 Lines
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s37, v40, 4			; GFX9-NEXT: v_readlane_b32 s37, v40, 4
	; GFX9-NEXT: v_readlane_b32 s36, v40, 3			; GFX9-NEXT: v_readlane_b32 s36, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v44, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 5
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%b = and i32 %b.arg, 16777215			%b = and i32 %b.arg, 16777215
	%s = and i32 %s.arg, 16777215			%s = and i32 %s.arg, 16777215

	Show All 28 Lines

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

	Show All 21 Lines

	; Has no stack objects, but introduces them due to the CSR spill. We			; Has no stack objects, but introduces them due to the CSR spill. We
	; see the FP modified in the callee with IPRA. We should not have			; see the FP modified in the callee with IPRA. We should not have
	; redundant spills of s33 or assert.			; redundant spills of s33 or assert.
	define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {			define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {
	; CHECK-LABEL: csr_vgpr_spill_fp_callee:			; CHECK-LABEL: csr_vgpr_spill_fp_callee:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s24, s33			; CHECK-NEXT: s_mov_b32 s18, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: v_writelane_b32 v1, s30, 0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v1, s31, 1
	Show All 10 Lines
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s24			; CHECK-NEXT: s_mov_b32 s33, s18
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	call fastcc void @callee_has_fp()			call fastcc void @callee_has_fp()
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_call() {			define amdgpu_kernel void @kernel_call() {
	; CHECK-LABEL: kernel_call:			; CHECK-LABEL: kernel_call:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_mov_b32 s32, 0			; CHECK-NEXT: s_mov_b32 s32, 0x400
	; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; CHECK-NEXT: s_add_u32 s0, s0, s17			; CHECK-NEXT: s_add_u32 s0, s0, s17
	; CHECK-NEXT: s_addc_u32 s1, s1, 0			; CHECK-NEXT: s_addc_u32 s1, s1, 0
				; CHECK-NEXT: ; implicit-def: $vgpr3
	; CHECK-NEXT: v_writelane_b32 v3, s16, 0			; CHECK-NEXT: v_writelane_b32 v3, s16, 0
				; CHECK-NEXT: s_or_saveexec_b64 s[24:25], -1
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[24:25]
	; CHECK-NEXT: s_mov_b32 s13, s15			; CHECK-NEXT: s_mov_b32 s13, s15
	; CHECK-NEXT: s_mov_b32 s12, s14			; CHECK-NEXT: s_mov_b32 s12, s14
	; CHECK-NEXT: v_readlane_b32 s14, v3, 0			; CHECK-NEXT: v_readlane_b32 s14, v3, 0
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, csr_vgpr_spill_fp_callee@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, csr_vgpr_spill_fp_callee@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, csr_vgpr_spill_fp_callee@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, csr_vgpr_spill_fp_callee@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b32 s15, 20			; CHECK-NEXT: s_mov_b32 s15, 20
	; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2			; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2
	; CHECK-NEXT: s_mov_b32 s15, 10			; CHECK-NEXT: s_mov_b32 s15, 10
	; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1			; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1
	; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2			; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2
	; CHECK-NEXT: ; implicit-def: $sgpr15			; CHECK-NEXT: ; implicit-def: $sgpr15
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; CHECK-NEXT: s_or_saveexec_b64 s[24:25], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[24:25]
				; CHECK-NEXT: ; kill: killed $vgpr0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	bb:			bb:
	tail call fastcc void @csr_vgpr_spill_fp_callee()			tail call fastcc void @csr_vgpr_spill_fp_callee()
	ret void			ret void
	}			}

	; Same, except with a tail call.			; Same, except with a tail call.
	define internal fastcc void @csr_vgpr_spill_fp_tailcall_callee() #0 {			define internal fastcc void @csr_vgpr_spill_fp_tailcall_callee() #0 {
	; CHECK-LABEL: csr_vgpr_spill_fp_tailcall_callee:			; CHECK-LABEL: csr_vgpr_spill_fp_tailcall_callee:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: v_writelane_b32 v1, s33, 0			; CHECK-NEXT: v_writelane_b32 v1, s33, 0
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; clobber csr v40			; CHECK-NEXT: ; clobber csr v40
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, callee_has_fp@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, callee_has_fp@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, callee_has_fp@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, callee_has_fp@rel32@hi+12
	; CHECK-NEXT: v_readlane_b32 s33, v1, 0			; CHECK-NEXT: v_readlane_b32 s33, v1, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: s_xor_saveexec_b64 s[20:21], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[18:19], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[20:21]			; CHECK-NEXT: s_mov_b64 exec, s[18:19]
	; CHECK-NEXT: s_setpc_b64 s[16:17]			; CHECK-NEXT: s_setpc_b64 s[16:17]
	bb:			bb:
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	tail call fastcc void @callee_has_fp()			tail call fastcc void @callee_has_fp()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_tailcall() {			define amdgpu_kernel void @kernel_tailcall() {
	; CHECK-LABEL: kernel_tailcall:			; CHECK-LABEL: kernel_tailcall:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_mov_b32 s32, 0			; CHECK-NEXT: s_mov_b32 s32, 0x400
	; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; CHECK-NEXT: s_add_u32 s0, s0, s17			; CHECK-NEXT: s_add_u32 s0, s0, s17
	; CHECK-NEXT: s_addc_u32 s1, s1, 0			; CHECK-NEXT: s_addc_u32 s1, s1, 0
				; CHECK-NEXT: ; implicit-def: $vgpr3
	; CHECK-NEXT: v_writelane_b32 v3, s16, 0			; CHECK-NEXT: v_writelane_b32 v3, s16, 0
				; CHECK-NEXT: s_or_saveexec_b64 s[24:25], -1
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[24:25]
	; CHECK-NEXT: s_mov_b32 s13, s15			; CHECK-NEXT: s_mov_b32 s13, s15
	; CHECK-NEXT: s_mov_b32 s12, s14			; CHECK-NEXT: s_mov_b32 s12, s14
	; CHECK-NEXT: v_readlane_b32 s14, v3, 0			; CHECK-NEXT: v_readlane_b32 s14, v3, 0
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, csr_vgpr_spill_fp_tailcall_callee@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, csr_vgpr_spill_fp_tailcall_callee@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, csr_vgpr_spill_fp_tailcall_callee@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, csr_vgpr_spill_fp_tailcall_callee@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b32 s15, 20			; CHECK-NEXT: s_mov_b32 s15, 20
	; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2			; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2
	; CHECK-NEXT: s_mov_b32 s15, 10			; CHECK-NEXT: s_mov_b32 s15, 10
	; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1			; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1
	; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2			; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2
	; CHECK-NEXT: ; implicit-def: $sgpr15			; CHECK-NEXT: ; implicit-def: $sgpr15
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; CHECK-NEXT: s_or_saveexec_b64 s[24:25], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[24:25]
				; CHECK-NEXT: ; kill: killed $vgpr0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	bb:			bb:
	tail call fastcc void @csr_vgpr_spill_fp_tailcall_callee()			tail call fastcc void @csr_vgpr_spill_fp_tailcall_callee()
	ret void			ret void
	}			}

	define hidden i32 @tail_call() #1 {			define hidden i32 @tail_call() #1 {
	; CHECK-LABEL: tail_call:			; CHECK-LABEL: tail_call:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s4, s33			; CHECK-NEXT: s_mov_b32 s4, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	ret i32 0			ret i32 0
	}			}

	define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {			define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:			; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s24, s33			; CHECK-NEXT: s_mov_b32 s18, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: v_writelane_b32 v1, s30, 0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v1, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, tail_call@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s24			; CHECK-NEXT: s_mov_b32 s33, s18
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @tail_call()			%call = call i32 @tail_call()
	ret i32 %call			ret i32 %call
	}			}

	define hidden i32 @caller_save_vgpr_spill_fp() #0 {			define hidden i32 @caller_save_vgpr_spill_fp() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp:			; CHECK-LABEL: caller_save_vgpr_spill_fp:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s25, s33			; CHECK-NEXT: s_mov_b32 s19, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v2, s30, 0			; CHECK-NEXT: v_writelane_b32 v2, s30, 0
	; CHECK-NEXT: v_writelane_b32 v2, s31, 1			; CHECK-NEXT: v_writelane_b32 v2, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12
				arsenmUnsubmitted Not Done Reply Inline Actions This is an unfortunate regression but what I expected arsenm: This is an unfortunate regression but what I expected
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: v_readlane_b32 s31, v2, 1			; CHECK-NEXT: v_readlane_b32 s31, v2, 1
	; CHECK-NEXT: v_readlane_b32 s30, v2, 0			; CHECK-NEXT: v_readlane_b32 s30, v2, 0
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s25			; CHECK-NEXT: s_mov_b32 s33, s19
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @caller_save_vgpr_spill_fp_tail_call()			%call = call i32 @caller_save_vgpr_spill_fp_tail_call()
	ret i32 %call			ret i32 %call
	}			}

	define protected amdgpu_kernel void @kernel() {			define protected amdgpu_kernel void @kernel() {
	; CHECK-LABEL: kernel:			; CHECK-LABEL: kernel:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_mov_b32 s32, 0			; CHECK-NEXT: s_mov_b32 s32, 0x400
	; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; CHECK-NEXT: s_add_u32 s0, s0, s17			; CHECK-NEXT: s_add_u32 s0, s0, s17
	; CHECK-NEXT: s_addc_u32 s1, s1, 0			; CHECK-NEXT: s_addc_u32 s1, s1, 0
				; CHECK-NEXT: ; implicit-def: $vgpr3
	; CHECK-NEXT: v_writelane_b32 v3, s16, 0			; CHECK-NEXT: v_writelane_b32 v3, s16, 0
				; CHECK-NEXT: s_or_saveexec_b64 s[24:25], -1
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[24:25]
	; CHECK-NEXT: s_mov_b32 s13, s15			; CHECK-NEXT: s_mov_b32 s13, s15
	; CHECK-NEXT: s_mov_b32 s12, s14			; CHECK-NEXT: s_mov_b32 s12, s14
	; CHECK-NEXT: v_readlane_b32 s14, v3, 0			; CHECK-NEXT: v_readlane_b32 s14, v3, 0
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, caller_save_vgpr_spill_fp@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, caller_save_vgpr_spill_fp@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, caller_save_vgpr_spill_fp@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, caller_save_vgpr_spill_fp@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b32 s15, 20			; CHECK-NEXT: s_mov_b32 s15, 20
	; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2			; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2
	; CHECK-NEXT: s_mov_b32 s15, 10			; CHECK-NEXT: s_mov_b32 s15, 10
	; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1			; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1
	; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2			; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2
	; CHECK-NEXT: ; implicit-def: $sgpr15			; CHECK-NEXT: ; implicit-def: $sgpr15
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; CHECK-NEXT: ; kill: def $vgpr1 killed $vgpr0 killed $exec
				; CHECK-NEXT: s_or_saveexec_b64 s[24:25], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[24:25]
				; CHECK-NEXT: ; kill: killed $vgpr0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	entry:			entry:
	%call = call i32 @caller_save_vgpr_spill_fp()			%call = call i32 @caller_save_vgpr_spill_fp()
	ret void			ret void
	}			}

	attributes #0 = { "frame-pointer"="none" noinline }			attributes #0 = { "frame-pointer"="none" noinline }
	attributes #1 = { "frame-pointer"="all" noinline }			attributes #1 = { "frame-pointer"="all" noinline }

llvm/test/CodeGen/AMDGPU/nested-calls.ll

	Show All 9 Lines
	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:
	; GCN: s_waitcnt			; GCN: s_waitcnt

	; Spill CSR VGPR used for SGPR spilling			; Spill CSR VGPR used for SGPR spilling
	; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33			; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: v_writelane_b32 v41, [[FP_SCRATCH_COPY]], 0			; GCN-DAG: v_writelane_b32 v40, [[FP_SCRATCH_COPY]], 2
	; GCN-DAG: v_writelane_b32 v40, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v40, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_readlane_b32 s31, v40, 1			; GCN: v_readlane_b32 s31, v40, 1
	; GCN: v_readlane_b32 s30, v40, 0			; GCN: v_readlane_b32 s30, v40, 0

	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v41, 0			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v40, 2
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @test_func_call_external_void_func_i32_imm() #0 {			define void @test_func_call_external_void_func_i32_imm() #0 {
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	Show All 22 Lines

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

	Show All 11 Lines
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0: ; %entry			; CHECK-NEXT: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s16, s33			; CHECK-NEXT: s_mov_b32 s16, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1			; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[18:19]			; CHECK-NEXT: s_mov_b64 exec, s[18:19]
	; CHECK-NEXT: v_writelane_b32 v41, s16, 0			; CHECK-NEXT: v_writelane_b32 v40, s16, 2
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3			; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12
	; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: .Ltmp1:			; CHECK-NEXT: .Ltmp1:
	; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: v_readlane_b32 s4, v41, 0			; CHECK-NEXT: v_readlane_b32 s4, v40, 2
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: .loc 0 32 1 epilogue_begin is_stmt 0 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 epilogue_begin is_stmt 0 ; lane-info.cpp:32:1
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	entry:			entry:
	call void @_ZL13sleep_foreverv(), !dbg !1646			call void @_ZL13sleep_foreverv(), !dbg !1646
	ret void, !dbg !1647			ret void, !dbg !1647
	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

; FIXME: we should disable sdwa peephole because dead-code elimination, that		; FIXME: we should disable sdwa peephole because dead-code elimination, that
; runs after peephole, ruins this test (different register numbers)		; runs after peephole, ruins this test (different register numbers)

; Spill all SGPRs so multiple VGPRs are required for spilling all of them.		; Spill all SGPRs so multiple VGPRs are required for spilling all of them.

; Ideally we only need 2 VGPRs for all spilling. The VGPRs are		; Ideally we only need 2 VGPRs for all spilling. The VGPRs are
; allocated per-frame index, so it's possible to get up with more.		; allocated per-frame index, so it's possible to get up with more.
define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(ptr addrspace(1) %out, i32 %in) #0 {		define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(ptr addrspace(1) %out, i32 %in) #0 {
; GCN-LABEL: spill_sgprs_to_multiple_vgprs:		; GCN-LABEL: spill_sgprs_to_multiple_vgprs:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
		; GCN-NEXT: s_mov_b32 s92, SCRATCH_RSRC_DWORD0
		; GCN-NEXT: s_mov_b32 s93, SCRATCH_RSRC_DWORD1
		; GCN-NEXT: s_mov_b32 s94, -1
		; GCN-NEXT: s_mov_b32 s95, 0xe8f000
		; GCN-NEXT: s_add_u32 s92, s92, s11
		; GCN-NEXT: s_addc_u32 s93, s93, 0
		; GCN-NEXT: ; implicit-def: $vgpr0
		; GCN-NEXT: ; implicit-def: $vgpr1
		; GCN-NEXT: ; implicit-def: $vgpr2
; GCN-NEXT: s_load_dword s0, s[4:5], 0xb		; GCN-NEXT: s_load_dword s0, s[4:5], 0xb
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 0		; GCN-NEXT: v_writelane_b32 v2, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v2, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v2, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v2, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v2, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v2, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v2, s10, 6
; GCN-NEXT: v_writelane_b32 v0, s11, 7		; GCN-NEXT: v_writelane_b32 v2, s11, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 8		; GCN-NEXT: v_writelane_b32 v2, s4, 8
; GCN-NEXT: v_writelane_b32 v0, s5, 9		; GCN-NEXT: v_writelane_b32 v2, s5, 9
; GCN-NEXT: v_writelane_b32 v0, s6, 10		; GCN-NEXT: v_writelane_b32 v2, s6, 10
; GCN-NEXT: v_writelane_b32 v0, s7, 11		; GCN-NEXT: v_writelane_b32 v2, s7, 11
; GCN-NEXT: v_writelane_b32 v0, s8, 12		; GCN-NEXT: v_writelane_b32 v2, s8, 12
; GCN-NEXT: v_writelane_b32 v0, s9, 13		; GCN-NEXT: v_writelane_b32 v2, s9, 13
; GCN-NEXT: v_writelane_b32 v0, s10, 14		; GCN-NEXT: v_writelane_b32 v2, s10, 14
; GCN-NEXT: v_writelane_b32 v0, s11, 15		; GCN-NEXT: v_writelane_b32 v2, s11, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 16		; GCN-NEXT: v_writelane_b32 v2, s4, 16
; GCN-NEXT: v_writelane_b32 v0, s5, 17		; GCN-NEXT: v_writelane_b32 v2, s5, 17
; GCN-NEXT: v_writelane_b32 v0, s6, 18		; GCN-NEXT: v_writelane_b32 v2, s6, 18
; GCN-NEXT: v_writelane_b32 v0, s7, 19		; GCN-NEXT: v_writelane_b32 v2, s7, 19
; GCN-NEXT: v_writelane_b32 v0, s8, 20		; GCN-NEXT: v_writelane_b32 v2, s8, 20
; GCN-NEXT: v_writelane_b32 v0, s9, 21		; GCN-NEXT: v_writelane_b32 v2, s9, 21
; GCN-NEXT: v_writelane_b32 v0, s10, 22		; GCN-NEXT: v_writelane_b32 v2, s10, 22
; GCN-NEXT: v_writelane_b32 v0, s11, 23		; GCN-NEXT: v_writelane_b32 v2, s11, 23
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 24		; GCN-NEXT: v_writelane_b32 v2, s4, 24
; GCN-NEXT: v_writelane_b32 v0, s5, 25		; GCN-NEXT: v_writelane_b32 v2, s5, 25
; GCN-NEXT: v_writelane_b32 v0, s6, 26		; GCN-NEXT: v_writelane_b32 v2, s6, 26
; GCN-NEXT: v_writelane_b32 v0, s7, 27		; GCN-NEXT: v_writelane_b32 v2, s7, 27
; GCN-NEXT: v_writelane_b32 v0, s8, 28		; GCN-NEXT: v_writelane_b32 v2, s8, 28
; GCN-NEXT: v_writelane_b32 v0, s9, 29		; GCN-NEXT: v_writelane_b32 v2, s9, 29
; GCN-NEXT: v_writelane_b32 v0, s10, 30		; GCN-NEXT: v_writelane_b32 v2, s10, 30
; GCN-NEXT: v_writelane_b32 v0, s11, 31		; GCN-NEXT: v_writelane_b32 v2, s11, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 32		; GCN-NEXT: v_writelane_b32 v2, s4, 32
; GCN-NEXT: v_writelane_b32 v0, s5, 33		; GCN-NEXT: v_writelane_b32 v2, s5, 33
; GCN-NEXT: v_writelane_b32 v0, s6, 34		; GCN-NEXT: v_writelane_b32 v2, s6, 34
; GCN-NEXT: v_writelane_b32 v0, s7, 35		; GCN-NEXT: v_writelane_b32 v2, s7, 35
; GCN-NEXT: v_writelane_b32 v0, s8, 36		; GCN-NEXT: v_writelane_b32 v2, s8, 36
; GCN-NEXT: v_writelane_b32 v0, s9, 37		; GCN-NEXT: v_writelane_b32 v2, s9, 37
; GCN-NEXT: v_writelane_b32 v0, s10, 38		; GCN-NEXT: v_writelane_b32 v2, s10, 38
; GCN-NEXT: v_writelane_b32 v0, s11, 39		; GCN-NEXT: v_writelane_b32 v2, s11, 39
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 40		; GCN-NEXT: v_writelane_b32 v2, s4, 40
; GCN-NEXT: v_writelane_b32 v0, s5, 41		; GCN-NEXT: v_writelane_b32 v2, s5, 41
; GCN-NEXT: v_writelane_b32 v0, s6, 42		; GCN-NEXT: v_writelane_b32 v2, s6, 42
; GCN-NEXT: v_writelane_b32 v0, s7, 43		; GCN-NEXT: v_writelane_b32 v2, s7, 43
; GCN-NEXT: v_writelane_b32 v0, s8, 44		; GCN-NEXT: v_writelane_b32 v2, s8, 44
; GCN-NEXT: v_writelane_b32 v0, s9, 45		; GCN-NEXT: v_writelane_b32 v2, s9, 45
; GCN-NEXT: v_writelane_b32 v0, s10, 46		; GCN-NEXT: v_writelane_b32 v2, s10, 46
; GCN-NEXT: v_writelane_b32 v0, s11, 47		; GCN-NEXT: v_writelane_b32 v2, s11, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 48		; GCN-NEXT: v_writelane_b32 v2, s4, 48
; GCN-NEXT: v_writelane_b32 v0, s5, 49		; GCN-NEXT: v_writelane_b32 v2, s5, 49
; GCN-NEXT: v_writelane_b32 v0, s6, 50		; GCN-NEXT: v_writelane_b32 v2, s6, 50
; GCN-NEXT: v_writelane_b32 v0, s7, 51		; GCN-NEXT: v_writelane_b32 v2, s7, 51
; GCN-NEXT: v_writelane_b32 v0, s8, 52		; GCN-NEXT: v_writelane_b32 v2, s8, 52
; GCN-NEXT: v_writelane_b32 v0, s9, 53		; GCN-NEXT: v_writelane_b32 v2, s9, 53
; GCN-NEXT: v_writelane_b32 v0, s10, 54		; GCN-NEXT: v_writelane_b32 v2, s10, 54
; GCN-NEXT: v_writelane_b32 v0, s11, 55		; GCN-NEXT: v_writelane_b32 v2, s11, 55
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 56		; GCN-NEXT: v_writelane_b32 v2, s4, 56
; GCN-NEXT: v_writelane_b32 v0, s5, 57		; GCN-NEXT: v_writelane_b32 v2, s5, 57
; GCN-NEXT: v_writelane_b32 v0, s6, 58		; GCN-NEXT: v_writelane_b32 v2, s6, 58
; GCN-NEXT: v_writelane_b32 v0, s7, 59		; GCN-NEXT: v_writelane_b32 v2, s7, 59
; GCN-NEXT: v_writelane_b32 v0, s8, 60		; GCN-NEXT: v_writelane_b32 v2, s8, 60
; GCN-NEXT: v_writelane_b32 v0, s9, 61		; GCN-NEXT: v_writelane_b32 v2, s9, 61
; GCN-NEXT: v_writelane_b32 v0, s10, 62		; GCN-NEXT: v_writelane_b32 v2, s10, 62
; GCN-NEXT: v_writelane_b32 v0, s11, 63		; GCN-NEXT: v_writelane_b32 v2, s11, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v2, off, s[92:95], 0 offset:12 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 0		; GCN-NEXT: v_writelane_b32 v1, s4, 0
; GCN-NEXT: v_writelane_b32 v1, s5, 1		; GCN-NEXT: v_writelane_b32 v1, s5, 1
; GCN-NEXT: v_writelane_b32 v1, s6, 2		; GCN-NEXT: v_writelane_b32 v1, s6, 2
; GCN-NEXT: v_writelane_b32 v1, s7, 3		; GCN-NEXT: v_writelane_b32 v1, s7, 3
; GCN-NEXT: v_writelane_b32 v1, s8, 4		; GCN-NEXT: v_writelane_b32 v1, s8, 4
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_writelane_b32 v1, s4, 56		; GCN-NEXT: v_writelane_b32 v1, s4, 56
; GCN-NEXT: v_writelane_b32 v1, s5, 57		; GCN-NEXT: v_writelane_b32 v1, s5, 57
; GCN-NEXT: v_writelane_b32 v1, s6, 58		; GCN-NEXT: v_writelane_b32 v1, s6, 58
; GCN-NEXT: v_writelane_b32 v1, s7, 59		; GCN-NEXT: v_writelane_b32 v1, s7, 59
; GCN-NEXT: v_writelane_b32 v1, s8, 60		; GCN-NEXT: v_writelane_b32 v1, s8, 60
; GCN-NEXT: v_writelane_b32 v1, s9, 61		; GCN-NEXT: v_writelane_b32 v1, s9, 61
; GCN-NEXT: v_writelane_b32 v1, s10, 62		; GCN-NEXT: v_writelane_b32 v1, s10, 62
; GCN-NEXT: v_writelane_b32 v1, s11, 63		; GCN-NEXT: v_writelane_b32 v1, s11, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v1, off, s[92:95], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v2, s4, 0		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v2, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v2, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v2, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v2, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v2, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v2, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v2, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s11, 7
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[92:95], 0 offset:4 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB0_2		; GCN-NEXT: s_cbranch_scc1 .LBB0_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s8, v1, 56		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s9, v1, 57		; GCN-NEXT: buffer_load_dword v0, off, s[92:95], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s10, v1, 58		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s11, v1, 59		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s12, v1, 60		; GCN-NEXT: buffer_load_dword v1, off, s[92:95], 0 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s13, v1, 61		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s14, v1, 62		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s15, v1, 63		; GCN-NEXT: buffer_load_dword v2, off, s[92:95], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s16, v1, 48		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s17, v1, 49		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s18, v1, 50		; GCN-NEXT: v_readlane_b32 s8, v2, 56
; GCN-NEXT: v_readlane_b32 s19, v1, 51		; GCN-NEXT: v_readlane_b32 s9, v2, 57
; GCN-NEXT: v_readlane_b32 s20, v1, 52		; GCN-NEXT: v_readlane_b32 s10, v2, 58
; GCN-NEXT: v_readlane_b32 s21, v1, 53		; GCN-NEXT: v_readlane_b32 s11, v2, 59
; GCN-NEXT: v_readlane_b32 s22, v1, 54		; GCN-NEXT: v_readlane_b32 s12, v2, 60
; GCN-NEXT: v_readlane_b32 s23, v1, 55		; GCN-NEXT: v_readlane_b32 s13, v2, 61
; GCN-NEXT: v_readlane_b32 s24, v1, 40		; GCN-NEXT: v_readlane_b32 s14, v2, 62
; GCN-NEXT: v_readlane_b32 s25, v1, 41		; GCN-NEXT: v_readlane_b32 s15, v2, 63
; GCN-NEXT: v_readlane_b32 s26, v1, 42		; GCN-NEXT: v_readlane_b32 s16, v2, 48
; GCN-NEXT: v_readlane_b32 s27, v1, 43		; GCN-NEXT: v_readlane_b32 s17, v2, 49
; GCN-NEXT: v_readlane_b32 s28, v1, 44		; GCN-NEXT: v_readlane_b32 s18, v2, 50
; GCN-NEXT: v_readlane_b32 s29, v1, 45		; GCN-NEXT: v_readlane_b32 s19, v2, 51
; GCN-NEXT: v_readlane_b32 s30, v1, 46		; GCN-NEXT: v_readlane_b32 s20, v2, 52
; GCN-NEXT: v_readlane_b32 s31, v1, 47		; GCN-NEXT: v_readlane_b32 s21, v2, 53
; GCN-NEXT: v_readlane_b32 s36, v1, 32		; GCN-NEXT: v_readlane_b32 s22, v2, 54
; GCN-NEXT: v_readlane_b32 s37, v1, 33		; GCN-NEXT: v_readlane_b32 s23, v2, 55
; GCN-NEXT: v_readlane_b32 s38, v1, 34		; GCN-NEXT: v_readlane_b32 s24, v2, 40
; GCN-NEXT: v_readlane_b32 s39, v1, 35		; GCN-NEXT: v_readlane_b32 s25, v2, 41
; GCN-NEXT: v_readlane_b32 s40, v1, 36		; GCN-NEXT: v_readlane_b32 s26, v2, 42
; GCN-NEXT: v_readlane_b32 s41, v1, 37		; GCN-NEXT: v_readlane_b32 s27, v2, 43
; GCN-NEXT: v_readlane_b32 s42, v1, 38		; GCN-NEXT: v_readlane_b32 s28, v2, 44
; GCN-NEXT: v_readlane_b32 s43, v1, 39		; GCN-NEXT: v_readlane_b32 s29, v2, 45
; GCN-NEXT: v_readlane_b32 s44, v1, 24		; GCN-NEXT: v_readlane_b32 s30, v2, 46
; GCN-NEXT: v_readlane_b32 s45, v1, 25		; GCN-NEXT: v_readlane_b32 s31, v2, 47
; GCN-NEXT: v_readlane_b32 s46, v1, 26		; GCN-NEXT: v_readlane_b32 s36, v2, 32
; GCN-NEXT: v_readlane_b32 s47, v1, 27		; GCN-NEXT: v_readlane_b32 s37, v2, 33
; GCN-NEXT: v_readlane_b32 s48, v1, 28		; GCN-NEXT: v_readlane_b32 s38, v2, 34
; GCN-NEXT: v_readlane_b32 s49, v1, 29		; GCN-NEXT: v_readlane_b32 s39, v2, 35
; GCN-NEXT: v_readlane_b32 s50, v1, 30		; GCN-NEXT: v_readlane_b32 s40, v2, 36
; GCN-NEXT: v_readlane_b32 s51, v1, 31		; GCN-NEXT: v_readlane_b32 s41, v2, 37
; GCN-NEXT: v_readlane_b32 s52, v1, 16		; GCN-NEXT: v_readlane_b32 s42, v2, 38
; GCN-NEXT: v_readlane_b32 s53, v1, 17		; GCN-NEXT: v_readlane_b32 s43, v2, 39
; GCN-NEXT: v_readlane_b32 s54, v1, 18		; GCN-NEXT: v_readlane_b32 s44, v2, 24
; GCN-NEXT: v_readlane_b32 s55, v1, 19		; GCN-NEXT: v_readlane_b32 s45, v2, 25
; GCN-NEXT: v_readlane_b32 s56, v1, 20		; GCN-NEXT: v_readlane_b32 s46, v2, 26
; GCN-NEXT: v_readlane_b32 s57, v1, 21		; GCN-NEXT: v_readlane_b32 s47, v2, 27
; GCN-NEXT: v_readlane_b32 s58, v1, 22		; GCN-NEXT: v_readlane_b32 s48, v2, 28
; GCN-NEXT: v_readlane_b32 s59, v1, 23		; GCN-NEXT: v_readlane_b32 s49, v2, 29
; GCN-NEXT: v_readlane_b32 s60, v1, 8		; GCN-NEXT: v_readlane_b32 s50, v2, 30
; GCN-NEXT: v_readlane_b32 s61, v1, 9		; GCN-NEXT: v_readlane_b32 s51, v2, 31
; GCN-NEXT: v_readlane_b32 s62, v1, 10		; GCN-NEXT: v_readlane_b32 s52, v2, 16
; GCN-NEXT: v_readlane_b32 s63, v1, 11		; GCN-NEXT: v_readlane_b32 s53, v2, 17
; GCN-NEXT: v_readlane_b32 s64, v1, 12		; GCN-NEXT: v_readlane_b32 s54, v2, 18
; GCN-NEXT: v_readlane_b32 s65, v1, 13		; GCN-NEXT: v_readlane_b32 s55, v2, 19
; GCN-NEXT: v_readlane_b32 s66, v1, 14		; GCN-NEXT: v_readlane_b32 s56, v2, 20
; GCN-NEXT: v_readlane_b32 s67, v1, 15		; GCN-NEXT: v_readlane_b32 s57, v2, 21
; GCN-NEXT: v_readlane_b32 s68, v1, 0		; GCN-NEXT: v_readlane_b32 s58, v2, 22
; GCN-NEXT: v_readlane_b32 s69, v1, 1		; GCN-NEXT: v_readlane_b32 s59, v2, 23
; GCN-NEXT: v_readlane_b32 s70, v1, 2		; GCN-NEXT: v_readlane_b32 s60, v2, 8
; GCN-NEXT: v_readlane_b32 s71, v1, 3		; GCN-NEXT: v_readlane_b32 s61, v2, 9
; GCN-NEXT: v_readlane_b32 s72, v1, 4		; GCN-NEXT: v_readlane_b32 s62, v2, 10
; GCN-NEXT: v_readlane_b32 s73, v1, 5		; GCN-NEXT: v_readlane_b32 s63, v2, 11
; GCN-NEXT: v_readlane_b32 s74, v1, 6		; GCN-NEXT: v_readlane_b32 s64, v2, 12
; GCN-NEXT: v_readlane_b32 s75, v1, 7		; GCN-NEXT: v_readlane_b32 s65, v2, 13
; GCN-NEXT: v_readlane_b32 s76, v0, 56		; GCN-NEXT: v_readlane_b32 s66, v2, 14
; GCN-NEXT: v_readlane_b32 s77, v0, 57		; GCN-NEXT: v_readlane_b32 s67, v2, 15
; GCN-NEXT: v_readlane_b32 s78, v0, 58		; GCN-NEXT: v_readlane_b32 s68, v2, 0
; GCN-NEXT: v_readlane_b32 s79, v0, 59		; GCN-NEXT: v_readlane_b32 s69, v2, 1
; GCN-NEXT: v_readlane_b32 s80, v0, 60		; GCN-NEXT: v_readlane_b32 s70, v2, 2
; GCN-NEXT: v_readlane_b32 s81, v0, 61		; GCN-NEXT: v_readlane_b32 s71, v2, 3
; GCN-NEXT: v_readlane_b32 s82, v0, 62		; GCN-NEXT: v_readlane_b32 s72, v2, 4
; GCN-NEXT: v_readlane_b32 s83, v0, 63		; GCN-NEXT: v_readlane_b32 s73, v2, 5
; GCN-NEXT: v_readlane_b32 s84, v0, 48		; GCN-NEXT: v_readlane_b32 s74, v2, 6
; GCN-NEXT: v_readlane_b32 s85, v0, 49		; GCN-NEXT: v_readlane_b32 s75, v2, 7
; GCN-NEXT: v_readlane_b32 s86, v0, 50		; GCN-NEXT: v_readlane_b32 s76, v1, 56
; GCN-NEXT: v_readlane_b32 s87, v0, 51		; GCN-NEXT: v_readlane_b32 s77, v1, 57
; GCN-NEXT: v_readlane_b32 s88, v0, 52		; GCN-NEXT: v_readlane_b32 s78, v1, 58
; GCN-NEXT: v_readlane_b32 s89, v0, 53		; GCN-NEXT: v_readlane_b32 s79, v1, 59
; GCN-NEXT: v_readlane_b32 s90, v0, 54		; GCN-NEXT: v_readlane_b32 s80, v1, 60
; GCN-NEXT: v_readlane_b32 s91, v0, 55		; GCN-NEXT: v_readlane_b32 s81, v1, 61
; GCN-NEXT: v_readlane_b32 s0, v0, 0		; GCN-NEXT: v_readlane_b32 s82, v1, 62
; GCN-NEXT: v_readlane_b32 s1, v0, 1		; GCN-NEXT: v_readlane_b32 s83, v1, 63
; GCN-NEXT: v_readlane_b32 s2, v0, 2		; GCN-NEXT: v_readlane_b32 s84, v1, 48
; GCN-NEXT: v_readlane_b32 s3, v0, 3		; GCN-NEXT: v_readlane_b32 s85, v1, 49
; GCN-NEXT: v_readlane_b32 s4, v0, 4		; GCN-NEXT: v_readlane_b32 s86, v1, 50
; GCN-NEXT: v_readlane_b32 s5, v0, 5		; GCN-NEXT: v_readlane_b32 s87, v1, 51
; GCN-NEXT: v_readlane_b32 s6, v0, 6		; GCN-NEXT: v_readlane_b32 s88, v1, 52
; GCN-NEXT: v_readlane_b32 s7, v0, 7		; GCN-NEXT: v_readlane_b32 s89, v1, 53
		; GCN-NEXT: v_readlane_b32 s90, v1, 54
		; GCN-NEXT: v_readlane_b32 s91, v1, 55
		; GCN-NEXT: v_readlane_b32 s0, v1, 0
		; GCN-NEXT: v_readlane_b32 s1, v1, 1
		; GCN-NEXT: v_readlane_b32 s2, v1, 2
		; GCN-NEXT: v_readlane_b32 s3, v1, 3
		; GCN-NEXT: v_readlane_b32 s4, v1, 4
		; GCN-NEXT: v_readlane_b32 s5, v1, 5
		; GCN-NEXT: v_readlane_b32 s6, v1, 6
		; GCN-NEXT: v_readlane_b32 s7, v1, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 8		; GCN-NEXT: v_readlane_b32 s0, v1, 8
; GCN-NEXT: v_readlane_b32 s1, v0, 9		; GCN-NEXT: v_readlane_b32 s1, v1, 9
; GCN-NEXT: v_readlane_b32 s2, v0, 10		; GCN-NEXT: v_readlane_b32 s2, v1, 10
; GCN-NEXT: v_readlane_b32 s3, v0, 11		; GCN-NEXT: v_readlane_b32 s3, v1, 11
; GCN-NEXT: v_readlane_b32 s4, v0, 12		; GCN-NEXT: v_readlane_b32 s4, v1, 12
; GCN-NEXT: v_readlane_b32 s5, v0, 13		; GCN-NEXT: v_readlane_b32 s5, v1, 13
; GCN-NEXT: v_readlane_b32 s6, v0, 14		; GCN-NEXT: v_readlane_b32 s6, v1, 14
; GCN-NEXT: v_readlane_b32 s7, v0, 15		; GCN-NEXT: v_readlane_b32 s7, v1, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 16		; GCN-NEXT: v_readlane_b32 s0, v1, 16
; GCN-NEXT: v_readlane_b32 s1, v0, 17		; GCN-NEXT: v_readlane_b32 s1, v1, 17
; GCN-NEXT: v_readlane_b32 s2, v0, 18		; GCN-NEXT: v_readlane_b32 s2, v1, 18
; GCN-NEXT: v_readlane_b32 s3, v0, 19		; GCN-NEXT: v_readlane_b32 s3, v1, 19
; GCN-NEXT: v_readlane_b32 s4, v0, 20		; GCN-NEXT: v_readlane_b32 s4, v1, 20
; GCN-NEXT: v_readlane_b32 s5, v0, 21		; GCN-NEXT: v_readlane_b32 s5, v1, 21
; GCN-NEXT: v_readlane_b32 s6, v0, 22		; GCN-NEXT: v_readlane_b32 s6, v1, 22
; GCN-NEXT: v_readlane_b32 s7, v0, 23		; GCN-NEXT: v_readlane_b32 s7, v1, 23
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 24		; GCN-NEXT: v_readlane_b32 s0, v1, 24
; GCN-NEXT: v_readlane_b32 s1, v0, 25		; GCN-NEXT: v_readlane_b32 s1, v1, 25
; GCN-NEXT: v_readlane_b32 s2, v0, 26		; GCN-NEXT: v_readlane_b32 s2, v1, 26
; GCN-NEXT: v_readlane_b32 s3, v0, 27		; GCN-NEXT: v_readlane_b32 s3, v1, 27
; GCN-NEXT: v_readlane_b32 s4, v0, 28		; GCN-NEXT: v_readlane_b32 s4, v1, 28
; GCN-NEXT: v_readlane_b32 s5, v0, 29		; GCN-NEXT: v_readlane_b32 s5, v1, 29
; GCN-NEXT: v_readlane_b32 s6, v0, 30		; GCN-NEXT: v_readlane_b32 s6, v1, 30
; GCN-NEXT: v_readlane_b32 s7, v0, 31		; GCN-NEXT: v_readlane_b32 s7, v1, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 32		; GCN-NEXT: v_readlane_b32 s0, v1, 32
; GCN-NEXT: v_readlane_b32 s1, v0, 33		; GCN-NEXT: v_readlane_b32 s1, v1, 33
; GCN-NEXT: v_readlane_b32 s2, v0, 34		; GCN-NEXT: v_readlane_b32 s2, v1, 34
; GCN-NEXT: v_readlane_b32 s3, v0, 35		; GCN-NEXT: v_readlane_b32 s3, v1, 35
; GCN-NEXT: v_readlane_b32 s4, v0, 36		; GCN-NEXT: v_readlane_b32 s4, v1, 36
; GCN-NEXT: v_readlane_b32 s5, v0, 37		; GCN-NEXT: v_readlane_b32 s5, v1, 37
; GCN-NEXT: v_readlane_b32 s6, v0, 38		; GCN-NEXT: v_readlane_b32 s6, v1, 38
; GCN-NEXT: v_readlane_b32 s7, v0, 39		; GCN-NEXT: v_readlane_b32 s7, v1, 39
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 40		; GCN-NEXT: v_readlane_b32 s0, v1, 40
; GCN-NEXT: v_readlane_b32 s1, v0, 41		; GCN-NEXT: v_readlane_b32 s1, v1, 41
; GCN-NEXT: v_readlane_b32 s2, v0, 42		; GCN-NEXT: v_readlane_b32 s2, v1, 42
; GCN-NEXT: v_readlane_b32 s3, v0, 43		; GCN-NEXT: v_readlane_b32 s3, v1, 43
; GCN-NEXT: v_readlane_b32 s4, v0, 44		; GCN-NEXT: v_readlane_b32 s4, v1, 44
; GCN-NEXT: v_readlane_b32 s5, v0, 45		; GCN-NEXT: v_readlane_b32 s5, v1, 45
; GCN-NEXT: v_readlane_b32 s6, v0, 46		; GCN-NEXT: v_readlane_b32 s6, v1, 46
; GCN-NEXT: v_readlane_b32 s7, v0, 47		; GCN-NEXT: v_readlane_b32 s7, v1, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v2, 0		; GCN-NEXT: v_readlane_b32 s0, v0, 0
; GCN-NEXT: v_readlane_b32 s1, v2, 1		; GCN-NEXT: v_readlane_b32 s1, v0, 1
; GCN-NEXT: v_readlane_b32 s2, v2, 2		; GCN-NEXT: v_readlane_b32 s2, v0, 2
; GCN-NEXT: v_readlane_b32 s3, v2, 3		; GCN-NEXT: v_readlane_b32 s3, v0, 3
; GCN-NEXT: v_readlane_b32 s4, v2, 4		; GCN-NEXT: v_readlane_b32 s4, v0, 4
; GCN-NEXT: v_readlane_b32 s5, v2, 5		; GCN-NEXT: v_readlane_b32 s5, v0, 5
; GCN-NEXT: v_readlane_b32 s6, v2, 6		; GCN-NEXT: v_readlane_b32 s6, v0, 6
; GCN-NEXT: v_readlane_b32 s7, v2, 7		; GCN-NEXT: v_readlane_b32 s7, v0, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[84:91]		; GCN-NEXT: ; use s[84:91]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[76:83]		; GCN-NEXT: ; use s[76:83]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[68:75]		; GCN-NEXT: ; use s[68:75]
Show All 18 Lines
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[8:15]		; GCN-NEXT: ; use s[8:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: .LBB0_2: ; %ret		; GCN-NEXT: .LBB0_2: ; %ret
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[92:95], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[92:95], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v2, off, s[92:95], 0 offset:12 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: ; kill: killed $vgpr2
		; GCN-NEXT: ; kill: killed $vgpr1
		; GCN-NEXT: ; kill: killed $vgpr0
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
%wide.sgpr0 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr0 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr1 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr1 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr2 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr2 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr4 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr4 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr5 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr5 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr6 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr6 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
Show All 34 Lines	ret:
ret void		ret void
}		}

; Some of the lanes of an SGPR spill are in one VGPR and some forced		; Some of the lanes of an SGPR spill are in one VGPR and some forced
; into the next available VGPR.		; into the next available VGPR.
define amdgpu_kernel void @split_sgpr_spill_2_vgprs(ptr addrspace(1) %out, i32 %in) #1 {		define amdgpu_kernel void @split_sgpr_spill_2_vgprs(ptr addrspace(1) %out, i32 %in) #1 {
; GCN-LABEL: split_sgpr_spill_2_vgprs:		; GCN-LABEL: split_sgpr_spill_2_vgprs:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
		; GCN-NEXT: s_mov_b32 s52, SCRATCH_RSRC_DWORD0
		; GCN-NEXT: s_mov_b32 s53, SCRATCH_RSRC_DWORD1
		; GCN-NEXT: s_mov_b32 s54, -1
		; GCN-NEXT: s_mov_b32 s55, 0xe8f000
		; GCN-NEXT: s_add_u32 s52, s52, s11
		; GCN-NEXT: s_addc_u32 s53, s53, 0
		; GCN-NEXT: ; implicit-def: $vgpr0
		; GCN-NEXT: ; implicit-def: $vgpr1
; GCN-NEXT: s_load_dword s0, s[4:5], 0xb		; GCN-NEXT: s_load_dword s0, s[4:5], 0xb
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 0		; GCN-NEXT: v_writelane_b32 v1, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v1, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v1, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v1, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v1, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v1, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v1, s10, 6
; GCN-NEXT: v_writelane_b32 v0, s11, 7		; GCN-NEXT: v_writelane_b32 v1, s11, 7
; GCN-NEXT: v_writelane_b32 v0, s12, 8		; GCN-NEXT: v_writelane_b32 v1, s12, 8
; GCN-NEXT: v_writelane_b32 v0, s13, 9		; GCN-NEXT: v_writelane_b32 v1, s13, 9
; GCN-NEXT: v_writelane_b32 v0, s14, 10		; GCN-NEXT: v_writelane_b32 v1, s14, 10
; GCN-NEXT: v_writelane_b32 v0, s15, 11		; GCN-NEXT: v_writelane_b32 v1, s15, 11
; GCN-NEXT: v_writelane_b32 v0, s16, 12		; GCN-NEXT: v_writelane_b32 v1, s16, 12
; GCN-NEXT: v_writelane_b32 v0, s17, 13		; GCN-NEXT: v_writelane_b32 v1, s17, 13
; GCN-NEXT: v_writelane_b32 v0, s18, 14		; GCN-NEXT: v_writelane_b32 v1, s18, 14
; GCN-NEXT: v_writelane_b32 v0, s19, 15		; GCN-NEXT: v_writelane_b32 v1, s19, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 16		; GCN-NEXT: v_writelane_b32 v1, s4, 16
; GCN-NEXT: v_writelane_b32 v0, s5, 17		; GCN-NEXT: v_writelane_b32 v1, s5, 17
; GCN-NEXT: v_writelane_b32 v0, s6, 18		; GCN-NEXT: v_writelane_b32 v1, s6, 18
; GCN-NEXT: v_writelane_b32 v0, s7, 19		; GCN-NEXT: v_writelane_b32 v1, s7, 19
; GCN-NEXT: v_writelane_b32 v0, s8, 20		; GCN-NEXT: v_writelane_b32 v1, s8, 20
; GCN-NEXT: v_writelane_b32 v0, s9, 21		; GCN-NEXT: v_writelane_b32 v1, s9, 21
; GCN-NEXT: v_writelane_b32 v0, s10, 22		; GCN-NEXT: v_writelane_b32 v1, s10, 22
; GCN-NEXT: v_writelane_b32 v0, s11, 23		; GCN-NEXT: v_writelane_b32 v1, s11, 23
; GCN-NEXT: v_writelane_b32 v0, s12, 24		; GCN-NEXT: v_writelane_b32 v1, s12, 24
; GCN-NEXT: v_writelane_b32 v0, s13, 25		; GCN-NEXT: v_writelane_b32 v1, s13, 25
; GCN-NEXT: v_writelane_b32 v0, s14, 26		; GCN-NEXT: v_writelane_b32 v1, s14, 26
; GCN-NEXT: v_writelane_b32 v0, s15, 27		; GCN-NEXT: v_writelane_b32 v1, s15, 27
; GCN-NEXT: v_writelane_b32 v0, s16, 28		; GCN-NEXT: v_writelane_b32 v1, s16, 28
; GCN-NEXT: v_writelane_b32 v0, s17, 29		; GCN-NEXT: v_writelane_b32 v1, s17, 29
; GCN-NEXT: v_writelane_b32 v0, s18, 30		; GCN-NEXT: v_writelane_b32 v1, s18, 30
; GCN-NEXT: v_writelane_b32 v0, s19, 31		; GCN-NEXT: v_writelane_b32 v1, s19, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 32		; GCN-NEXT: v_writelane_b32 v1, s4, 32
; GCN-NEXT: v_writelane_b32 v0, s5, 33		; GCN-NEXT: v_writelane_b32 v1, s5, 33
; GCN-NEXT: v_writelane_b32 v0, s6, 34		; GCN-NEXT: v_writelane_b32 v1, s6, 34
; GCN-NEXT: v_writelane_b32 v0, s7, 35		; GCN-NEXT: v_writelane_b32 v1, s7, 35
; GCN-NEXT: v_writelane_b32 v0, s8, 36		; GCN-NEXT: v_writelane_b32 v1, s8, 36
; GCN-NEXT: v_writelane_b32 v0, s9, 37		; GCN-NEXT: v_writelane_b32 v1, s9, 37
; GCN-NEXT: v_writelane_b32 v0, s10, 38		; GCN-NEXT: v_writelane_b32 v1, s10, 38
; GCN-NEXT: v_writelane_b32 v0, s11, 39		; GCN-NEXT: v_writelane_b32 v1, s11, 39
; GCN-NEXT: v_writelane_b32 v0, s12, 40		; GCN-NEXT: v_writelane_b32 v1, s12, 40
; GCN-NEXT: v_writelane_b32 v0, s13, 41		; GCN-NEXT: v_writelane_b32 v1, s13, 41
; GCN-NEXT: v_writelane_b32 v0, s14, 42		; GCN-NEXT: v_writelane_b32 v1, s14, 42
; GCN-NEXT: v_writelane_b32 v0, s15, 43		; GCN-NEXT: v_writelane_b32 v1, s15, 43
; GCN-NEXT: v_writelane_b32 v0, s16, 44		; GCN-NEXT: v_writelane_b32 v1, s16, 44
; GCN-NEXT: v_writelane_b32 v0, s17, 45		; GCN-NEXT: v_writelane_b32 v1, s17, 45
; GCN-NEXT: v_writelane_b32 v0, s18, 46		; GCN-NEXT: v_writelane_b32 v1, s18, 46
; GCN-NEXT: v_writelane_b32 v0, s19, 47		; GCN-NEXT: v_writelane_b32 v1, s19, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v0, s4, 48		; GCN-NEXT: v_writelane_b32 v1, s4, 48
; GCN-NEXT: v_writelane_b32 v0, s5, 49		; GCN-NEXT: v_writelane_b32 v1, s5, 49
; GCN-NEXT: v_writelane_b32 v0, s6, 50		; GCN-NEXT: v_writelane_b32 v1, s6, 50
; GCN-NEXT: v_writelane_b32 v0, s7, 51		; GCN-NEXT: v_writelane_b32 v1, s7, 51
; GCN-NEXT: v_writelane_b32 v0, s8, 52		; GCN-NEXT: v_writelane_b32 v1, s8, 52
; GCN-NEXT: v_writelane_b32 v0, s9, 53		; GCN-NEXT: v_writelane_b32 v1, s9, 53
; GCN-NEXT: v_writelane_b32 v0, s10, 54		; GCN-NEXT: v_writelane_b32 v1, s10, 54
; GCN-NEXT: v_writelane_b32 v0, s11, 55		; GCN-NEXT: v_writelane_b32 v1, s11, 55
; GCN-NEXT: v_writelane_b32 v0, s12, 56		; GCN-NEXT: v_writelane_b32 v1, s12, 56
; GCN-NEXT: v_writelane_b32 v0, s13, 57		; GCN-NEXT: v_writelane_b32 v1, s13, 57
; GCN-NEXT: v_writelane_b32 v0, s14, 58		; GCN-NEXT: v_writelane_b32 v1, s14, 58
; GCN-NEXT: v_writelane_b32 v0, s15, 59		; GCN-NEXT: v_writelane_b32 v1, s15, 59
; GCN-NEXT: v_writelane_b32 v0, s16, 60		; GCN-NEXT: v_writelane_b32 v1, s16, 60
; GCN-NEXT: v_writelane_b32 v0, s17, 61		; GCN-NEXT: v_writelane_b32 v1, s17, 61
; GCN-NEXT: v_writelane_b32 v0, s18, 62		; GCN-NEXT: v_writelane_b32 v1, s18, 62
; GCN-NEXT: v_writelane_b32 v0, s19, 63		; GCN-NEXT: v_writelane_b32 v1, s19, 63
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_store_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 0		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v1, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v1, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v1, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v1, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v1, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v1, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v1, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s2, 8		; GCN-NEXT: v_writelane_b32 v0, s2, 8
; GCN-NEXT: v_writelane_b32 v1, s3, 9		; GCN-NEXT: v_writelane_b32 v0, s3, 9
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB1_2		; GCN-NEXT: s_cbranch_scc1 .LBB1_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s16, v1, 8		; GCN-NEXT: v_readlane_b32 s16, v1, 8
; GCN-NEXT: v_readlane_b32 s17, v1, 9		; GCN-NEXT: v_readlane_b32 s17, v1, 9
; GCN-NEXT: v_readlane_b32 s20, v1, 0		; GCN-NEXT: v_readlane_b32 s20, v1, 0
; GCN-NEXT: v_readlane_b32 s21, v1, 1		; GCN-NEXT: v_readlane_b32 s21, v1, 1
; GCN-NEXT: v_readlane_b32 s22, v1, 2		; GCN-NEXT: v_readlane_b32 s22, v1, 2
; GCN-NEXT: v_readlane_b32 s23, v1, 3		; GCN-NEXT: v_readlane_b32 s23, v1, 3
; GCN-NEXT: v_readlane_b32 s24, v1, 4		; GCN-NEXT: v_readlane_b32 s24, v1, 4
; GCN-NEXT: v_readlane_b32 s25, v1, 5		; GCN-NEXT: v_readlane_b32 s25, v1, 5
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[16:17]		; GCN-NEXT: ; use s[16:17]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]		; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: .LBB1_2: ; %ret		; GCN-NEXT: .LBB1_2: ; %ret
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
		; GCN-NEXT: ; kill: killed $vgpr1
		; GCN-NEXT: ; kill: killed $vgpr0
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0

Show All 20 Lines
; GCN-LABEL: no_vgprs_last_sgpr_spill:		; GCN-LABEL: no_vgprs_last_sgpr_spill:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s52, SCRATCH_RSRC_DWORD0		; GCN-NEXT: s_mov_b32 s52, SCRATCH_RSRC_DWORD0
; GCN-NEXT: s_mov_b32 s53, SCRATCH_RSRC_DWORD1		; GCN-NEXT: s_mov_b32 s53, SCRATCH_RSRC_DWORD1
; GCN-NEXT: s_mov_b32 s54, -1		; GCN-NEXT: s_mov_b32 s54, -1
; GCN-NEXT: s_mov_b32 s55, 0xe8f000		; GCN-NEXT: s_mov_b32 s55, 0xe8f000
; GCN-NEXT: s_add_u32 s52, s52, s11		; GCN-NEXT: s_add_u32 s52, s52, s11
; GCN-NEXT: s_addc_u32 s53, s53, 0		; GCN-NEXT: s_addc_u32 s53, s53, 0
		; GCN-NEXT: ; implicit-def: $vgpr0
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_load_dword s0, s[4:5], 0xb		; GCN-NEXT: s_load_dword s0, s[4:5], 0xb
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 0		; GCN-NEXT: s_waitcnt vmcnt(1)
; GCN-NEXT: v_writelane_b32 v31, s5, 1		; GCN-NEXT: v_writelane_b32 v1, s4, 0
; GCN-NEXT: v_writelane_b32 v31, s6, 2		; GCN-NEXT: v_writelane_b32 v1, s5, 1
; GCN-NEXT: v_writelane_b32 v31, s7, 3		; GCN-NEXT: v_writelane_b32 v1, s6, 2
; GCN-NEXT: v_writelane_b32 v31, s8, 4		; GCN-NEXT: v_writelane_b32 v1, s7, 3
; GCN-NEXT: v_writelane_b32 v31, s9, 5		; GCN-NEXT: v_writelane_b32 v1, s8, 4
; GCN-NEXT: v_writelane_b32 v31, s10, 6		; GCN-NEXT: v_writelane_b32 v1, s9, 5
; GCN-NEXT: v_writelane_b32 v31, s11, 7		; GCN-NEXT: v_writelane_b32 v1, s10, 6
; GCN-NEXT: v_writelane_b32 v31, s12, 8		; GCN-NEXT: v_writelane_b32 v1, s11, 7
; GCN-NEXT: v_writelane_b32 v31, s13, 9		; GCN-NEXT: v_writelane_b32 v1, s12, 8
; GCN-NEXT: v_writelane_b32 v31, s14, 10		; GCN-NEXT: v_writelane_b32 v1, s13, 9
; GCN-NEXT: v_writelane_b32 v31, s15, 11		; GCN-NEXT: v_writelane_b32 v1, s14, 10
; GCN-NEXT: v_writelane_b32 v31, s16, 12		; GCN-NEXT: v_writelane_b32 v1, s15, 11
; GCN-NEXT: v_writelane_b32 v31, s17, 13		; GCN-NEXT: v_writelane_b32 v1, s16, 12
; GCN-NEXT: v_writelane_b32 v31, s18, 14		; GCN-NEXT: v_writelane_b32 v1, s17, 13
; GCN-NEXT: v_writelane_b32 v31, s19, 15		; GCN-NEXT: v_writelane_b32 v1, s18, 14
		; GCN-NEXT: v_writelane_b32 v1, s19, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 16		; GCN-NEXT: v_writelane_b32 v1, s4, 16
; GCN-NEXT: v_writelane_b32 v31, s5, 17		; GCN-NEXT: v_writelane_b32 v1, s5, 17
; GCN-NEXT: v_writelane_b32 v31, s6, 18		; GCN-NEXT: v_writelane_b32 v1, s6, 18
; GCN-NEXT: v_writelane_b32 v31, s7, 19		; GCN-NEXT: v_writelane_b32 v1, s7, 19
; GCN-NEXT: v_writelane_b32 v31, s8, 20		; GCN-NEXT: v_writelane_b32 v1, s8, 20
; GCN-NEXT: v_writelane_b32 v31, s9, 21		; GCN-NEXT: v_writelane_b32 v1, s9, 21
; GCN-NEXT: v_writelane_b32 v31, s10, 22		; GCN-NEXT: v_writelane_b32 v1, s10, 22
; GCN-NEXT: v_writelane_b32 v31, s11, 23		; GCN-NEXT: v_writelane_b32 v1, s11, 23
; GCN-NEXT: v_writelane_b32 v31, s12, 24		; GCN-NEXT: v_writelane_b32 v1, s12, 24
; GCN-NEXT: v_writelane_b32 v31, s13, 25		; GCN-NEXT: v_writelane_b32 v1, s13, 25
; GCN-NEXT: v_writelane_b32 v31, s14, 26		; GCN-NEXT: v_writelane_b32 v1, s14, 26
; GCN-NEXT: v_writelane_b32 v31, s15, 27		; GCN-NEXT: v_writelane_b32 v1, s15, 27
; GCN-NEXT: v_writelane_b32 v31, s16, 28		; GCN-NEXT: v_writelane_b32 v1, s16, 28
; GCN-NEXT: v_writelane_b32 v31, s17, 29		; GCN-NEXT: v_writelane_b32 v1, s17, 29
; GCN-NEXT: v_writelane_b32 v31, s18, 30		; GCN-NEXT: v_writelane_b32 v1, s18, 30
; GCN-NEXT: v_writelane_b32 v31, s19, 31		; GCN-NEXT: v_writelane_b32 v1, s19, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 32		; GCN-NEXT: v_writelane_b32 v1, s4, 32
; GCN-NEXT: v_writelane_b32 v31, s5, 33		; GCN-NEXT: v_writelane_b32 v1, s5, 33
; GCN-NEXT: v_writelane_b32 v31, s6, 34		; GCN-NEXT: v_writelane_b32 v1, s6, 34
; GCN-NEXT: v_writelane_b32 v31, s7, 35		; GCN-NEXT: v_writelane_b32 v1, s7, 35
; GCN-NEXT: v_writelane_b32 v31, s8, 36		; GCN-NEXT: v_writelane_b32 v1, s8, 36
; GCN-NEXT: v_writelane_b32 v31, s9, 37		; GCN-NEXT: v_writelane_b32 v1, s9, 37
; GCN-NEXT: v_writelane_b32 v31, s10, 38		; GCN-NEXT: v_writelane_b32 v1, s10, 38
; GCN-NEXT: v_writelane_b32 v31, s11, 39		; GCN-NEXT: v_writelane_b32 v1, s11, 39
; GCN-NEXT: v_writelane_b32 v31, s12, 40		; GCN-NEXT: v_writelane_b32 v1, s12, 40
; GCN-NEXT: v_writelane_b32 v31, s13, 41		; GCN-NEXT: v_writelane_b32 v1, s13, 41
; GCN-NEXT: v_writelane_b32 v31, s14, 42		; GCN-NEXT: v_writelane_b32 v1, s14, 42
; GCN-NEXT: v_writelane_b32 v31, s15, 43		; GCN-NEXT: v_writelane_b32 v1, s15, 43
; GCN-NEXT: v_writelane_b32 v31, s16, 44		; GCN-NEXT: v_writelane_b32 v1, s16, 44
; GCN-NEXT: v_writelane_b32 v31, s17, 45		; GCN-NEXT: v_writelane_b32 v1, s17, 45
; GCN-NEXT: v_writelane_b32 v31, s18, 46		; GCN-NEXT: v_writelane_b32 v1, s18, 46
; GCN-NEXT: v_writelane_b32 v31, s19, 47		; GCN-NEXT: v_writelane_b32 v1, s19, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 48		; GCN-NEXT: v_writelane_b32 v1, s4, 48
; GCN-NEXT: v_writelane_b32 v31, s5, 49		; GCN-NEXT: v_writelane_b32 v1, s5, 49
; GCN-NEXT: v_writelane_b32 v31, s6, 50		; GCN-NEXT: v_writelane_b32 v1, s6, 50
; GCN-NEXT: v_writelane_b32 v31, s7, 51		; GCN-NEXT: v_writelane_b32 v1, s7, 51
; GCN-NEXT: v_writelane_b32 v31, s8, 52		; GCN-NEXT: v_writelane_b32 v1, s8, 52
; GCN-NEXT: v_writelane_b32 v31, s9, 53		; GCN-NEXT: v_writelane_b32 v1, s9, 53
; GCN-NEXT: v_writelane_b32 v31, s10, 54		; GCN-NEXT: v_writelane_b32 v1, s10, 54
; GCN-NEXT: v_writelane_b32 v31, s11, 55		; GCN-NEXT: v_writelane_b32 v1, s11, 55
; GCN-NEXT: v_writelane_b32 v31, s12, 56		; GCN-NEXT: v_writelane_b32 v1, s12, 56
; GCN-NEXT: v_writelane_b32 v31, s13, 57		; GCN-NEXT: v_writelane_b32 v1, s13, 57
; GCN-NEXT: v_writelane_b32 v31, s14, 58		; GCN-NEXT: v_writelane_b32 v1, s14, 58
; GCN-NEXT: v_writelane_b32 v31, s15, 59		; GCN-NEXT: v_writelane_b32 v1, s15, 59
; GCN-NEXT: v_writelane_b32 v31, s16, 60		; GCN-NEXT: v_writelane_b32 v1, s16, 60
; GCN-NEXT: v_writelane_b32 v31, s17, 61		; GCN-NEXT: v_writelane_b32 v1, s17, 61
; GCN-NEXT: v_writelane_b32 v31, s18, 62		; GCN-NEXT: v_writelane_b32 v1, s18, 62
; GCN-NEXT: v_writelane_b32 v31, s19, 63		; GCN-NEXT: v_writelane_b32 v1, s19, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: s_waitcnt vmcnt(1)
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: v_writelane_b32 v0, s2, 0		; GCN-NEXT: v_writelane_b32 v0, s2, 0
; GCN-NEXT: v_writelane_b32 v0, s3, 1		; GCN-NEXT: v_writelane_b32 v0, s3, 1
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB2_2		; GCN-NEXT: s_cbranch_scc1 .LBB2_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s36, v31, 32		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s37, v31, 33		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s38, v31, 34		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s39, v31, 35		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s40, v31, 36		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s41, v31, 37		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s42, v31, 38		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s43, v31, 39		; GCN-NEXT: v_readlane_b32 s36, v1, 32
; GCN-NEXT: v_readlane_b32 s44, v31, 40		; GCN-NEXT: v_readlane_b32 s37, v1, 33
; GCN-NEXT: v_readlane_b32 s45, v31, 41		; GCN-NEXT: v_readlane_b32 s38, v1, 34
; GCN-NEXT: v_readlane_b32 s46, v31, 42		; GCN-NEXT: v_readlane_b32 s39, v1, 35
; GCN-NEXT: v_readlane_b32 s47, v31, 43		; GCN-NEXT: v_readlane_b32 s40, v1, 36
; GCN-NEXT: v_readlane_b32 s48, v31, 44		; GCN-NEXT: v_readlane_b32 s41, v1, 37
; GCN-NEXT: v_readlane_b32 s49, v31, 45		; GCN-NEXT: v_readlane_b32 s42, v1, 38
; GCN-NEXT: v_readlane_b32 s50, v31, 46		; GCN-NEXT: v_readlane_b32 s43, v1, 39
; GCN-NEXT: v_readlane_b32 s51, v31, 47		; GCN-NEXT: v_readlane_b32 s44, v1, 40
; GCN-NEXT: v_readlane_b32 s0, v31, 16		; GCN-NEXT: v_readlane_b32 s45, v1, 41
; GCN-NEXT: v_readlane_b32 s1, v31, 17		; GCN-NEXT: v_readlane_b32 s46, v1, 42
; GCN-NEXT: v_readlane_b32 s2, v31, 18		; GCN-NEXT: v_readlane_b32 s47, v1, 43
; GCN-NEXT: v_readlane_b32 s3, v31, 19		; GCN-NEXT: v_readlane_b32 s48, v1, 44
; GCN-NEXT: v_readlane_b32 s4, v31, 20		; GCN-NEXT: v_readlane_b32 s49, v1, 45
; GCN-NEXT: v_readlane_b32 s5, v31, 21		; GCN-NEXT: v_readlane_b32 s50, v1, 46
; GCN-NEXT: v_readlane_b32 s6, v31, 22		; GCN-NEXT: v_readlane_b32 s51, v1, 47
; GCN-NEXT: v_readlane_b32 s7, v31, 23		; GCN-NEXT: v_readlane_b32 s0, v1, 16
; GCN-NEXT: v_readlane_b32 s8, v31, 24		; GCN-NEXT: v_readlane_b32 s1, v1, 17
; GCN-NEXT: v_readlane_b32 s9, v31, 25		; GCN-NEXT: v_readlane_b32 s2, v1, 18
; GCN-NEXT: v_readlane_b32 s10, v31, 26		; GCN-NEXT: v_readlane_b32 s3, v1, 19
; GCN-NEXT: v_readlane_b32 s11, v31, 27		; GCN-NEXT: v_readlane_b32 s4, v1, 20
; GCN-NEXT: v_readlane_b32 s12, v31, 28		; GCN-NEXT: v_readlane_b32 s5, v1, 21
; GCN-NEXT: v_readlane_b32 s13, v31, 29		; GCN-NEXT: v_readlane_b32 s6, v1, 22
; GCN-NEXT: v_readlane_b32 s14, v31, 30		; GCN-NEXT: v_readlane_b32 s7, v1, 23
; GCN-NEXT: v_readlane_b32 s15, v31, 31		; GCN-NEXT: v_readlane_b32 s8, v1, 24
; GCN-NEXT: v_readlane_b32 s16, v31, 0		; GCN-NEXT: v_readlane_b32 s9, v1, 25
; GCN-NEXT: v_readlane_b32 s17, v31, 1		; GCN-NEXT: v_readlane_b32 s10, v1, 26
; GCN-NEXT: v_readlane_b32 s18, v31, 2		; GCN-NEXT: v_readlane_b32 s11, v1, 27
; GCN-NEXT: v_readlane_b32 s19, v31, 3		; GCN-NEXT: v_readlane_b32 s12, v1, 28
; GCN-NEXT: v_readlane_b32 s20, v31, 4		; GCN-NEXT: v_readlane_b32 s13, v1, 29
; GCN-NEXT: v_readlane_b32 s21, v31, 5		; GCN-NEXT: v_readlane_b32 s14, v1, 30
; GCN-NEXT: v_readlane_b32 s22, v31, 6		; GCN-NEXT: v_readlane_b32 s15, v1, 31
; GCN-NEXT: v_readlane_b32 s23, v31, 7		; GCN-NEXT: v_readlane_b32 s16, v1, 0
; GCN-NEXT: v_readlane_b32 s24, v31, 8		; GCN-NEXT: v_readlane_b32 s17, v1, 1
; GCN-NEXT: v_readlane_b32 s25, v31, 9		; GCN-NEXT: v_readlane_b32 s18, v1, 2
; GCN-NEXT: v_readlane_b32 s26, v31, 10		; GCN-NEXT: v_readlane_b32 s19, v1, 3
; GCN-NEXT: v_readlane_b32 s27, v31, 11		; GCN-NEXT: v_readlane_b32 s20, v1, 4
; GCN-NEXT: v_readlane_b32 s28, v31, 12		; GCN-NEXT: v_readlane_b32 s21, v1, 5
; GCN-NEXT: v_readlane_b32 s29, v31, 13		; GCN-NEXT: v_readlane_b32 s22, v1, 6
; GCN-NEXT: v_readlane_b32 s30, v31, 14		; GCN-NEXT: v_readlane_b32 s23, v1, 7
; GCN-NEXT: v_readlane_b32 s31, v31, 15		; GCN-NEXT: v_readlane_b32 s24, v1, 8
		; GCN-NEXT: v_readlane_b32 s25, v1, 9
		; GCN-NEXT: v_readlane_b32 s26, v1, 10
		; GCN-NEXT: v_readlane_b32 s27, v1, 11
		; GCN-NEXT: v_readlane_b32 s28, v1, 12
		; GCN-NEXT: v_readlane_b32 s29, v1, 13
		; GCN-NEXT: v_readlane_b32 s30, v1, 14
		; GCN-NEXT: v_readlane_b32 s31, v1, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[16:31]		; GCN-NEXT: ; use s[16:31]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]		; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s4, v31, 48		; GCN-NEXT: v_readlane_b32 s4, v1, 48
; GCN-NEXT: v_readlane_b32 s5, v31, 49		; GCN-NEXT: v_readlane_b32 s5, v1, 49
; GCN-NEXT: v_readlane_b32 s6, v31, 50		; GCN-NEXT: v_readlane_b32 s6, v1, 50
; GCN-NEXT: v_readlane_b32 s7, v31, 51		; GCN-NEXT: v_readlane_b32 s7, v1, 51
; GCN-NEXT: v_readlane_b32 s8, v31, 52		; GCN-NEXT: v_readlane_b32 s8, v1, 52
; GCN-NEXT: v_readlane_b32 s9, v31, 53		; GCN-NEXT: v_readlane_b32 s9, v1, 53
; GCN-NEXT: v_readlane_b32 s10, v31, 54		; GCN-NEXT: v_readlane_b32 s10, v1, 54
; GCN-NEXT: v_readlane_b32 s11, v31, 55		; GCN-NEXT: v_readlane_b32 s11, v1, 55
; GCN-NEXT: v_readlane_b32 s12, v31, 56		; GCN-NEXT: v_readlane_b32 s12, v1, 56
; GCN-NEXT: v_readlane_b32 s13, v31, 57		; GCN-NEXT: v_readlane_b32 s13, v1, 57
; GCN-NEXT: v_readlane_b32 s14, v31, 58		; GCN-NEXT: v_readlane_b32 s14, v1, 58
; GCN-NEXT: v_readlane_b32 s15, v31, 59		; GCN-NEXT: v_readlane_b32 s15, v1, 59
; GCN-NEXT: v_readlane_b32 s16, v31, 60		; GCN-NEXT: v_readlane_b32 s16, v1, 60
; GCN-NEXT: v_readlane_b32 s17, v31, 61		; GCN-NEXT: v_readlane_b32 s17, v1, 61
; GCN-NEXT: v_readlane_b32 s18, v31, 62		; GCN-NEXT: v_readlane_b32 s18, v1, 62
; GCN-NEXT: v_readlane_b32 s19, v31, 63		; GCN-NEXT: v_readlane_b32 s19, v1, 63
; GCN-NEXT: s_mov_b64 s[2:3], exec
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s0, v0, 0		; GCN-NEXT: v_readlane_b32 s0, v0, 0
; GCN-NEXT: v_readlane_b32 s1, v0, 1		; GCN-NEXT: v_readlane_b32 s1, v0, 1
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[2:3]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[36:51]		; GCN-NEXT: ; use s[36:51]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[4:19]		; GCN-NEXT: ; use s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:1]		; GCN-NEXT: ; use s[0:1]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: .LBB2_2: ; %ret		; GCN-NEXT: .LBB2_2: ; %ret
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: ; kill: killed $vgpr1
		; GCN-NEXT: ; kill: killed $vgpr0
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
call void asm sideeffect "", "~{v[0:7]}" () #0		call void asm sideeffect "", "~{v[0:7]}" () #0
call void asm sideeffect "", "~{v[8:15]}" () #0		call void asm sideeffect "", "~{v[8:15]}" () #0
call void asm sideeffect "", "~{v[16:23]}" () #0		call void asm sideeffect "", "~{v[16:23]}" () #0
call void asm sideeffect "", "~{v[24:27]}"() #0		call void asm sideeffect "", "~{v[24:27]}"() #0
call void asm sideeffect "", "~{v[28:29]}"() #0		call void asm sideeffect "", "~{v[28:29]}"() #0
call void asm sideeffect "", "~{v30}"() #0		call void asm sideeffect "", "~{v30}"() #0

Show All 23 Lines
; GCN-LABEL: no_vgprs_last_sgpr_spill_live_v0:		; GCN-LABEL: no_vgprs_last_sgpr_spill_live_v0:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s52, SCRATCH_RSRC_DWORD0		; GCN-NEXT: s_mov_b32 s52, SCRATCH_RSRC_DWORD0
; GCN-NEXT: s_mov_b32 s53, SCRATCH_RSRC_DWORD1		; GCN-NEXT: s_mov_b32 s53, SCRATCH_RSRC_DWORD1
; GCN-NEXT: s_mov_b32 s54, -1		; GCN-NEXT: s_mov_b32 s54, -1
; GCN-NEXT: s_mov_b32 s55, 0xe8f000		; GCN-NEXT: s_mov_b32 s55, 0xe8f000
; GCN-NEXT: s_add_u32 s52, s52, s11		; GCN-NEXT: s_add_u32 s52, s52, s11
; GCN-NEXT: s_addc_u32 s53, s53, 0		; GCN-NEXT: s_addc_u32 s53, s53, 0
		; GCN-NEXT: ; implicit-def: $vgpr0
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_load_dword s0, s[4:5], 0x9		; GCN-NEXT: s_load_dword s0, s[4:5], 0x9
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 0		; GCN-NEXT: s_waitcnt vmcnt(1)
; GCN-NEXT: v_writelane_b32 v31, s5, 1		; GCN-NEXT: v_writelane_b32 v1, s4, 0
; GCN-NEXT: v_writelane_b32 v31, s6, 2		; GCN-NEXT: v_writelane_b32 v1, s5, 1
; GCN-NEXT: v_writelane_b32 v31, s7, 3		; GCN-NEXT: v_writelane_b32 v1, s6, 2
; GCN-NEXT: v_writelane_b32 v31, s8, 4		; GCN-NEXT: v_writelane_b32 v1, s7, 3
; GCN-NEXT: v_writelane_b32 v31, s9, 5		; GCN-NEXT: v_writelane_b32 v1, s8, 4
; GCN-NEXT: v_writelane_b32 v31, s10, 6		; GCN-NEXT: v_writelane_b32 v1, s9, 5
; GCN-NEXT: v_writelane_b32 v31, s11, 7		; GCN-NEXT: v_writelane_b32 v1, s10, 6
; GCN-NEXT: v_writelane_b32 v31, s12, 8		; GCN-NEXT: v_writelane_b32 v1, s11, 7
; GCN-NEXT: v_writelane_b32 v31, s13, 9		; GCN-NEXT: v_writelane_b32 v1, s12, 8
; GCN-NEXT: v_writelane_b32 v31, s14, 10		; GCN-NEXT: v_writelane_b32 v1, s13, 9
; GCN-NEXT: v_writelane_b32 v31, s15, 11		; GCN-NEXT: v_writelane_b32 v1, s14, 10
; GCN-NEXT: v_writelane_b32 v31, s16, 12		; GCN-NEXT: v_writelane_b32 v1, s15, 11
; GCN-NEXT: v_writelane_b32 v31, s17, 13		; GCN-NEXT: v_writelane_b32 v1, s16, 12
; GCN-NEXT: v_writelane_b32 v31, s18, 14		; GCN-NEXT: v_writelane_b32 v1, s17, 13
; GCN-NEXT: v_writelane_b32 v31, s19, 15		; GCN-NEXT: v_writelane_b32 v1, s18, 14
		; GCN-NEXT: v_writelane_b32 v1, s19, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 16		; GCN-NEXT: v_writelane_b32 v1, s4, 16
; GCN-NEXT: v_writelane_b32 v31, s5, 17		; GCN-NEXT: v_writelane_b32 v1, s5, 17
; GCN-NEXT: v_writelane_b32 v31, s6, 18		; GCN-NEXT: v_writelane_b32 v1, s6, 18
; GCN-NEXT: v_writelane_b32 v31, s7, 19		; GCN-NEXT: v_writelane_b32 v1, s7, 19
; GCN-NEXT: v_writelane_b32 v31, s8, 20		; GCN-NEXT: v_writelane_b32 v1, s8, 20
; GCN-NEXT: v_writelane_b32 v31, s9, 21		; GCN-NEXT: v_writelane_b32 v1, s9, 21
; GCN-NEXT: v_writelane_b32 v31, s10, 22		; GCN-NEXT: v_writelane_b32 v1, s10, 22
; GCN-NEXT: v_writelane_b32 v31, s11, 23		; GCN-NEXT: v_writelane_b32 v1, s11, 23
; GCN-NEXT: v_writelane_b32 v31, s12, 24		; GCN-NEXT: v_writelane_b32 v1, s12, 24
; GCN-NEXT: v_writelane_b32 v31, s13, 25		; GCN-NEXT: v_writelane_b32 v1, s13, 25
; GCN-NEXT: v_writelane_b32 v31, s14, 26		; GCN-NEXT: v_writelane_b32 v1, s14, 26
; GCN-NEXT: v_writelane_b32 v31, s15, 27		; GCN-NEXT: v_writelane_b32 v1, s15, 27
; GCN-NEXT: v_writelane_b32 v31, s16, 28		; GCN-NEXT: v_writelane_b32 v1, s16, 28
; GCN-NEXT: v_writelane_b32 v31, s17, 29		; GCN-NEXT: v_writelane_b32 v1, s17, 29
; GCN-NEXT: v_writelane_b32 v31, s18, 30		; GCN-NEXT: v_writelane_b32 v1, s18, 30
; GCN-NEXT: v_writelane_b32 v31, s19, 31		; GCN-NEXT: v_writelane_b32 v1, s19, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 32		; GCN-NEXT: v_writelane_b32 v1, s4, 32
; GCN-NEXT: v_writelane_b32 v31, s5, 33		; GCN-NEXT: v_writelane_b32 v1, s5, 33
; GCN-NEXT: v_writelane_b32 v31, s6, 34		; GCN-NEXT: v_writelane_b32 v1, s6, 34
; GCN-NEXT: v_writelane_b32 v31, s7, 35		; GCN-NEXT: v_writelane_b32 v1, s7, 35
; GCN-NEXT: v_writelane_b32 v31, s8, 36		; GCN-NEXT: v_writelane_b32 v1, s8, 36
; GCN-NEXT: v_writelane_b32 v31, s9, 37		; GCN-NEXT: v_writelane_b32 v1, s9, 37
; GCN-NEXT: v_writelane_b32 v31, s10, 38		; GCN-NEXT: v_writelane_b32 v1, s10, 38
; GCN-NEXT: v_writelane_b32 v31, s11, 39		; GCN-NEXT: v_writelane_b32 v1, s11, 39
; GCN-NEXT: v_writelane_b32 v31, s12, 40		; GCN-NEXT: v_writelane_b32 v1, s12, 40
; GCN-NEXT: v_writelane_b32 v31, s13, 41		; GCN-NEXT: v_writelane_b32 v1, s13, 41
; GCN-NEXT: v_writelane_b32 v31, s14, 42		; GCN-NEXT: v_writelane_b32 v1, s14, 42
; GCN-NEXT: v_writelane_b32 v31, s15, 43		; GCN-NEXT: v_writelane_b32 v1, s15, 43
; GCN-NEXT: v_writelane_b32 v31, s16, 44		; GCN-NEXT: v_writelane_b32 v1, s16, 44
; GCN-NEXT: v_writelane_b32 v31, s17, 45		; GCN-NEXT: v_writelane_b32 v1, s17, 45
; GCN-NEXT: v_writelane_b32 v31, s18, 46		; GCN-NEXT: v_writelane_b32 v1, s18, 46
; GCN-NEXT: v_writelane_b32 v31, s19, 47		; GCN-NEXT: v_writelane_b32 v1, s19, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 48		; GCN-NEXT: v_writelane_b32 v1, s4, 48
; GCN-NEXT: v_writelane_b32 v31, s5, 49		; GCN-NEXT: v_writelane_b32 v1, s5, 49
; GCN-NEXT: v_writelane_b32 v31, s6, 50		; GCN-NEXT: v_writelane_b32 v1, s6, 50
; GCN-NEXT: v_writelane_b32 v31, s7, 51		; GCN-NEXT: v_writelane_b32 v1, s7, 51
; GCN-NEXT: v_writelane_b32 v31, s8, 52		; GCN-NEXT: v_writelane_b32 v1, s8, 52
; GCN-NEXT: v_writelane_b32 v31, s9, 53		; GCN-NEXT: v_writelane_b32 v1, s9, 53
; GCN-NEXT: v_writelane_b32 v31, s10, 54		; GCN-NEXT: v_writelane_b32 v1, s10, 54
; GCN-NEXT: v_writelane_b32 v31, s11, 55		; GCN-NEXT: v_writelane_b32 v1, s11, 55
; GCN-NEXT: v_writelane_b32 v31, s12, 56		; GCN-NEXT: v_writelane_b32 v1, s12, 56
; GCN-NEXT: v_writelane_b32 v31, s13, 57		; GCN-NEXT: v_writelane_b32 v1, s13, 57
; GCN-NEXT: v_writelane_b32 v31, s14, 58		; GCN-NEXT: v_writelane_b32 v1, s14, 58
; GCN-NEXT: v_writelane_b32 v31, s15, 59		; GCN-NEXT: v_writelane_b32 v1, s15, 59
; GCN-NEXT: v_writelane_b32 v31, s16, 60		; GCN-NEXT: v_writelane_b32 v1, s16, 60
; GCN-NEXT: v_writelane_b32 v31, s17, 61		; GCN-NEXT: v_writelane_b32 v1, s17, 61
; GCN-NEXT: v_writelane_b32 v31, s18, 62		; GCN-NEXT: v_writelane_b32 v1, s18, 62
; GCN-NEXT: v_writelane_b32 v31, s19, 63		; GCN-NEXT: v_writelane_b32 v1, s19, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: s_waitcnt vmcnt(1)
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: v_writelane_b32 v0, s2, 0		; GCN-NEXT: v_writelane_b32 v0, s2, 0
; GCN-NEXT: v_writelane_b32 v0, s3, 1		; GCN-NEXT: v_writelane_b32 v0, s3, 1
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB3_2		; GCN-NEXT: s_cbranch_scc1 .LBB3_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s36, v31, 32		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s37, v31, 33		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s38, v31, 34		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s39, v31, 35		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s40, v31, 36		; GCN-NEXT: buffer_load_dword v2, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s41, v31, 37		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s42, v31, 38		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s43, v31, 39		; GCN-NEXT: v_readlane_b32 s36, v2, 32
; GCN-NEXT: v_readlane_b32 s44, v31, 40		; GCN-NEXT: v_readlane_b32 s37, v2, 33
; GCN-NEXT: v_readlane_b32 s45, v31, 41		; GCN-NEXT: v_readlane_b32 s38, v2, 34
; GCN-NEXT: v_readlane_b32 s46, v31, 42		; GCN-NEXT: v_readlane_b32 s39, v2, 35
; GCN-NEXT: v_readlane_b32 s47, v31, 43		; GCN-NEXT: v_readlane_b32 s40, v2, 36
; GCN-NEXT: v_readlane_b32 s48, v31, 44		; GCN-NEXT: v_readlane_b32 s41, v2, 37
; GCN-NEXT: v_readlane_b32 s49, v31, 45		; GCN-NEXT: v_readlane_b32 s42, v2, 38
; GCN-NEXT: v_readlane_b32 s50, v31, 46		; GCN-NEXT: v_readlane_b32 s43, v2, 39
; GCN-NEXT: v_readlane_b32 s51, v31, 47		; GCN-NEXT: v_readlane_b32 s44, v2, 40
; GCN-NEXT: v_readlane_b32 s0, v31, 16		; GCN-NEXT: v_readlane_b32 s45, v2, 41
; GCN-NEXT: v_readlane_b32 s1, v31, 17		; GCN-NEXT: v_readlane_b32 s46, v2, 42
; GCN-NEXT: v_readlane_b32 s2, v31, 18		; GCN-NEXT: v_readlane_b32 s47, v2, 43
; GCN-NEXT: v_readlane_b32 s3, v31, 19		; GCN-NEXT: v_readlane_b32 s48, v2, 44
; GCN-NEXT: v_readlane_b32 s4, v31, 20		; GCN-NEXT: v_readlane_b32 s49, v2, 45
; GCN-NEXT: v_readlane_b32 s5, v31, 21		; GCN-NEXT: v_readlane_b32 s50, v2, 46
; GCN-NEXT: v_readlane_b32 s6, v31, 22		; GCN-NEXT: v_readlane_b32 s51, v2, 47
; GCN-NEXT: v_readlane_b32 s7, v31, 23		; GCN-NEXT: v_readlane_b32 s0, v2, 16
; GCN-NEXT: v_readlane_b32 s8, v31, 24		; GCN-NEXT: v_readlane_b32 s1, v2, 17
; GCN-NEXT: v_readlane_b32 s9, v31, 25		; GCN-NEXT: v_readlane_b32 s2, v2, 18
; GCN-NEXT: v_readlane_b32 s10, v31, 26		; GCN-NEXT: v_readlane_b32 s3, v2, 19
; GCN-NEXT: v_readlane_b32 s11, v31, 27		; GCN-NEXT: v_readlane_b32 s4, v2, 20
; GCN-NEXT: v_readlane_b32 s12, v31, 28		; GCN-NEXT: v_readlane_b32 s5, v2, 21
; GCN-NEXT: v_readlane_b32 s13, v31, 29		; GCN-NEXT: v_readlane_b32 s6, v2, 22
; GCN-NEXT: v_readlane_b32 s14, v31, 30		; GCN-NEXT: v_readlane_b32 s7, v2, 23
; GCN-NEXT: v_readlane_b32 s15, v31, 31		; GCN-NEXT: v_readlane_b32 s8, v2, 24
; GCN-NEXT: v_readlane_b32 s16, v31, 0		; GCN-NEXT: v_readlane_b32 s9, v2, 25
; GCN-NEXT: v_readlane_b32 s17, v31, 1		; GCN-NEXT: v_readlane_b32 s10, v2, 26
; GCN-NEXT: v_readlane_b32 s18, v31, 2		; GCN-NEXT: v_readlane_b32 s11, v2, 27
; GCN-NEXT: v_readlane_b32 s19, v31, 3		; GCN-NEXT: v_readlane_b32 s12, v2, 28
; GCN-NEXT: v_readlane_b32 s20, v31, 4		; GCN-NEXT: v_readlane_b32 s13, v2, 29
; GCN-NEXT: v_readlane_b32 s21, v31, 5		; GCN-NEXT: v_readlane_b32 s14, v2, 30
; GCN-NEXT: v_readlane_b32 s22, v31, 6		; GCN-NEXT: v_readlane_b32 s15, v2, 31
; GCN-NEXT: v_readlane_b32 s23, v31, 7		; GCN-NEXT: v_readlane_b32 s16, v2, 0
; GCN-NEXT: v_readlane_b32 s24, v31, 8		; GCN-NEXT: v_readlane_b32 s17, v2, 1
; GCN-NEXT: v_readlane_b32 s25, v31, 9		; GCN-NEXT: v_readlane_b32 s18, v2, 2
; GCN-NEXT: v_readlane_b32 s26, v31, 10		; GCN-NEXT: v_readlane_b32 s19, v2, 3
; GCN-NEXT: v_readlane_b32 s27, v31, 11		; GCN-NEXT: v_readlane_b32 s20, v2, 4
; GCN-NEXT: v_readlane_b32 s28, v31, 12		; GCN-NEXT: v_readlane_b32 s21, v2, 5
; GCN-NEXT: v_readlane_b32 s29, v31, 13		; GCN-NEXT: v_readlane_b32 s22, v2, 6
; GCN-NEXT: v_readlane_b32 s30, v31, 14		; GCN-NEXT: v_readlane_b32 s23, v2, 7
; GCN-NEXT: v_readlane_b32 s31, v31, 15		; GCN-NEXT: v_readlane_b32 s24, v2, 8
		; GCN-NEXT: v_readlane_b32 s25, v2, 9
		; GCN-NEXT: v_readlane_b32 s26, v2, 10
		; GCN-NEXT: v_readlane_b32 s27, v2, 11
		; GCN-NEXT: v_readlane_b32 s28, v2, 12
		; GCN-NEXT: v_readlane_b32 s29, v2, 13
		; GCN-NEXT: v_readlane_b32 s30, v2, 14
		; GCN-NEXT: v_readlane_b32 s31, v2, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def v0		; GCN-NEXT: ; def v0
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[16:31]		; GCN-NEXT: ; use s[16:31]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]		; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s4, v31, 48		; GCN-NEXT: v_readlane_b32 s4, v2, 48
; GCN-NEXT: v_readlane_b32 s5, v31, 49		; GCN-NEXT: v_readlane_b32 s5, v2, 49
; GCN-NEXT: v_readlane_b32 s6, v31, 50		; GCN-NEXT: v_readlane_b32 s6, v2, 50
; GCN-NEXT: v_readlane_b32 s7, v31, 51		; GCN-NEXT: v_readlane_b32 s7, v2, 51
; GCN-NEXT: v_readlane_b32 s8, v31, 52		; GCN-NEXT: v_readlane_b32 s8, v2, 52
; GCN-NEXT: v_readlane_b32 s9, v31, 53		; GCN-NEXT: v_readlane_b32 s9, v2, 53
; GCN-NEXT: v_readlane_b32 s10, v31, 54		; GCN-NEXT: v_readlane_b32 s10, v2, 54
; GCN-NEXT: v_readlane_b32 s11, v31, 55		; GCN-NEXT: v_readlane_b32 s11, v2, 55
; GCN-NEXT: v_readlane_b32 s12, v31, 56		; GCN-NEXT: v_readlane_b32 s12, v2, 56
; GCN-NEXT: v_readlane_b32 s13, v31, 57		; GCN-NEXT: v_readlane_b32 s13, v2, 57
; GCN-NEXT: v_readlane_b32 s14, v31, 58		; GCN-NEXT: v_readlane_b32 s14, v2, 58
; GCN-NEXT: v_readlane_b32 s15, v31, 59		; GCN-NEXT: v_readlane_b32 s15, v2, 59
; GCN-NEXT: v_readlane_b32 s16, v31, 60		; GCN-NEXT: v_readlane_b32 s16, v2, 60
; GCN-NEXT: v_readlane_b32 s17, v31, 61		; GCN-NEXT: v_readlane_b32 s17, v2, 61
; GCN-NEXT: v_readlane_b32 s18, v31, 62		; GCN-NEXT: v_readlane_b32 s18, v2, 62
; GCN-NEXT: v_readlane_b32 s19, v31, 63		; GCN-NEXT: v_readlane_b32 s19, v2, 63
; GCN-NEXT: s_mov_b64 s[2:3], exec
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v1, off, s[52:55], 0
; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s0, v1, 0		; GCN-NEXT: v_readlane_b32 s0, v1, 0
; GCN-NEXT: v_readlane_b32 s1, v1, 1		; GCN-NEXT: v_readlane_b32 s1, v1, 1
; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[2:3]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[36:51]		; GCN-NEXT: ; use s[36:51]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[4:19]		; GCN-NEXT: ; use s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:1]		; GCN-NEXT: ; use s[0:1]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use v0		; GCN-NEXT: ; use v0
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: .LBB3_2: ; %ret		; GCN-NEXT: .LBB3_2: ; %ret
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
		; GCN-NEXT: ; kill: killed $vgpr1
		; GCN-NEXT: ; kill: killed $vgpr0
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
call void asm sideeffect "", "~{v[0:7]}" () #0		call void asm sideeffect "", "~{v[0:7]}" () #0
call void asm sideeffect "", "~{v[8:15]}" () #0		call void asm sideeffect "", "~{v[8:15]}" () #0
call void asm sideeffect "", "~{v[16:23]}" () #0		call void asm sideeffect "", "~{v[16:23]}" () #0
call void asm sideeffect "", "~{v[24:27]}"() #0		call void asm sideeffect "", "~{v[24:27]}"() #0
call void asm sideeffect "", "~{v[28:29]}"() #0		call void asm sideeffect "", "~{v[28:29]}"() #0
call void asm sideeffect "", "~{v30}"() #0		call void asm sideeffect "", "~{v30}"() #0

Show All 24 Lines

llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX906 %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX908 %s

				; Due to high register pressure, regalloc would split the liverange of wwm VGPR register used for SGPR spills
				; and introduce a copy. The copy should be of whole-wave with exec mask manipulation around it.
				; FIXME: The destination register involved in the whole-wave copy should be considered for preserving all the lanes
				; with a spill/restore at function prolog/epilog. The copy might otherwise clobber its inactive lanes unwantedly.
				define void @preserve_wwm_copy_dstreg(ptr %parg0, ptr %parg1, ptr %parg2) #0 {
				; GFX906-LABEL: preserve_wwm_copy_dstreg:
				; GFX906: ; %bb.0:
				; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX906-NEXT: s_mov_b32 s16, s33
				; GFX906-NEXT: s_mov_b32 s33, s32
				; GFX906-NEXT: s_xor_saveexec_b64 s[18:19], -1
				; GFX906-NEXT: buffer_store_dword v33, off, s[0:3], s33 offset:144 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:152 ; 4-byte Folded Spill
				; GFX906-NEXT: s_mov_b64 exec, -1
				; GFX906-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:148 ; 4-byte Folded Spill
				; GFX906-NEXT: s_mov_b64 exec, s[18:19]
				; GFX906-NEXT: s_mov_b32 s21, s15
				; GFX906-NEXT: ; implicit-def: $vgpr2
				; GFX906-NEXT: s_mov_b32 s22, s14
				; GFX906-NEXT: v_writelane_b32 v2, s21, 0
				; GFX906-NEXT: v_writelane_b32 v2, s22, 1
				; GFX906-NEXT: s_mov_b32 s23, s13
				; GFX906-NEXT: v_writelane_b32 v2, s23, 2
				; GFX906-NEXT: s_mov_b32 s24, s12
				; GFX906-NEXT: v_writelane_b32 v2, s24, 3
				; GFX906-NEXT: s_mov_b64 s[26:27], s[10:11]
				; GFX906-NEXT: v_writelane_b32 v2, s26, 4
				; GFX906-NEXT: v_writelane_b32 v2, s27, 5
				; GFX906-NEXT: v_writelane_b32 v2, s8, 6
				; GFX906-NEXT: v_writelane_b32 v2, s9, 7
				; GFX906-NEXT: v_writelane_b32 v2, s6, 8
				; GFX906-NEXT: v_writelane_b32 v41, s16, 2
				; GFX906-NEXT: v_writelane_b32 v2, s7, 9
				; GFX906-NEXT: v_writelane_b32 v41, s30, 0
				; GFX906-NEXT: v_writelane_b32 v2, s4, 10
				; GFX906-NEXT: s_addk_i32 s32, 0x2800
				; GFX906-NEXT: v_writelane_b32 v41, s31, 1
				; GFX906-NEXT: v_mov_b32_e32 v32, v31
				; GFX906-NEXT: v_writelane_b32 v2, s5, 11
				; GFX906-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX906-NEXT: v_mov_b32_e32 v33, v2
				; GFX906-NEXT: s_mov_b64 exec, s[34:35]
				; GFX906-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def v[0:31]
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:64 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:68 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:72 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:76 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:80 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:84 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:88 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:92 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:96 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:100 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:104 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:108 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v24, off, s[0:3], s33 offset:112 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v25, off, s[0:3], s33 offset:116 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v26, off, s[0:3], s33 offset:120 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v27, off, s[0:3], s33 offset:124 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v28, off, s[0:3], s33 offset:128 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v29, off, s[0:3], s33 offset:132 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v30, off, s[0:3], s33 offset:136 ; 4-byte Folded Spill
				; GFX906-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:140 ; 4-byte Folded Spill
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def v40
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s11
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX906-NEXT: v_mov_b32_e32 v40, v33
				; GFX906-NEXT: s_mov_b64 exec, s[34:35]
				; GFX906-NEXT: v_writelane_b32 v40, s11, 12
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s12
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s12, 13
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s13
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s13, 14
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s14
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s14, 15
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s15
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s15, 16
				; GFX906-NEXT: s_getpc_b64 s[10:11]
				; GFX906-NEXT: s_add_u32 s10, s10, foo@gotpcrel32@lo+4
				; GFX906-NEXT: s_addc_u32 s11, s11, foo@gotpcrel32@hi+12
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s16
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s16, 17
				; GFX906-NEXT: s_load_dwordx2 s[10:11], s[10:11], 0x0
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s17
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s17, 18
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s18
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s18, 19
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s19
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s19, 20
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s20
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_writelane_b32 v40, s20, 21
				; GFX906-NEXT: s_waitcnt lgkmcnt(0)
				; GFX906-NEXT: v_writelane_b32 v40, s10, 22
				; GFX906-NEXT: v_writelane_b32 v40, s11, 23
				; GFX906-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX906-NEXT: s_mov_b64 exec, s[34:35]
				; GFX906-NEXT: v_readlane_b32 s16, v40, 22
				; GFX906-NEXT: s_mov_b32 s12, s24
				; GFX906-NEXT: s_mov_b32 s13, s23
				; GFX906-NEXT: s_mov_b32 s14, s22
				; GFX906-NEXT: v_mov_b32_e32 v31, v32
				; GFX906-NEXT: s_mov_b32 s15, s21
				; GFX906-NEXT: s_mov_b64 s[10:11], s[26:27]
				; GFX906-NEXT: v_readlane_b32 s17, v40, 23
				; GFX906-NEXT: buffer_store_dword v32, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GFX906-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GFX906-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX906-NEXT: s_mov_b64 exec, s[34:35]
				; GFX906-NEXT: v_readlane_b32 s11, v40, 12
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s11
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s12, v40, 13
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s12
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s13, v40, 14
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s13
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s14, v40, 15
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s14
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s15, v40, 16
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s15
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s16, v40, 17
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s16
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s17, v40, 18
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s17
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s18, v40, 19
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s18
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s19, v40, 20
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s19
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s20, v40, 21
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s20
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s21
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s22
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s23
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s24
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s25
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s26
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s27
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s28
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; def s29
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX906-NEXT: v_writelane_b32 v40, s21, 24
				; GFX906-NEXT: v_writelane_b32 v40, s22, 25
				; GFX906-NEXT: v_writelane_b32 v40, s23, 26
				; GFX906-NEXT: v_writelane_b32 v40, s24, 27
				; GFX906-NEXT: v_writelane_b32 v40, s25, 28
				; GFX906-NEXT: v_writelane_b32 v40, s26, 29
				; GFX906-NEXT: v_writelane_b32 v40, s27, 30
				; GFX906-NEXT: v_writelane_b32 v40, s28, 31
				; GFX906-NEXT: v_writelane_b32 v40, s29, 32
				; GFX906-NEXT: v_readlane_b32 s4, v40, 10
				; GFX906-NEXT: v_readlane_b32 s6, v40, 8
				; GFX906-NEXT: v_readlane_b32 s8, v40, 6
				; GFX906-NEXT: v_readlane_b32 s10, v40, 4
				; GFX906-NEXT: v_readlane_b32 s16, v40, 22
				; GFX906-NEXT: v_readlane_b32 s12, v40, 3
				; GFX906-NEXT: v_readlane_b32 s13, v40, 2
				; GFX906-NEXT: v_readlane_b32 s14, v40, 1
				; GFX906-NEXT: v_readlane_b32 s15, v40, 0
				; GFX906-NEXT: v_readlane_b32 s5, v40, 11
				; GFX906-NEXT: v_readlane_b32 s7, v40, 9
				; GFX906-NEXT: v_readlane_b32 s9, v40, 7
				; GFX906-NEXT: v_readlane_b32 s11, v40, 5
				; GFX906-NEXT: v_readlane_b32 s17, v40, 23
				; GFX906-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX906-NEXT: s_mov_b64 exec, s[34:35]
				; GFX906-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GFX906-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX906-NEXT: s_mov_b64 exec, s[34:35]
				; GFX906-NEXT: v_readlane_b32 s21, v40, 24
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s21
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s22, v40, 25
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s22
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s23, v40, 26
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s23
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s24, v40, 27
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s24
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s25, v40, 28
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s25
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s26, v40, 29
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s26
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s27, v40, 30
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s27
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s28, v40, 31
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s28
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: v_readlane_b32 s29, v40, 32
				; GFX906-NEXT: ;;#ASMSTART
				; GFX906-NEXT: ; use s29
				; GFX906-NEXT: ;;#ASMEND
				; GFX906-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX906-NEXT: v_readlane_b32 s4, v40, 10
				; GFX906-NEXT: v_readlane_b32 s6, v40, 8
				; GFX906-NEXT: v_readlane_b32 s8, v40, 6
				; GFX906-NEXT: v_readlane_b32 s10, v40, 4
				; GFX906-NEXT: v_readlane_b32 s16, v40, 22
				; GFX906-NEXT: v_readlane_b32 s5, v40, 11
				; GFX906-NEXT: v_readlane_b32 s7, v40, 9
				; GFX906-NEXT: v_readlane_b32 s9, v40, 7
				; GFX906-NEXT: v_readlane_b32 s11, v40, 5
				; GFX906-NEXT: v_readlane_b32 s12, v40, 3
				; GFX906-NEXT: v_readlane_b32 s13, v40, 2
				; GFX906-NEXT: v_readlane_b32 s14, v40, 1
				; GFX906-NEXT: v_readlane_b32 s15, v40, 0
				; GFX906-NEXT: v_readlane_b32 s17, v40, 23
				; GFX906-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX906-NEXT: s_mov_b64 exec, s[34:35]
				; GFX906-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GFX906-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:64 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:68 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:72 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:76 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:80 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:84 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:88 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:92 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:96 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:100 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v24, off, s[0:3], s33 offset:104 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v25, off, s[0:3], s33 offset:108 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v26, off, s[0:3], s33 offset:112 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v27, off, s[0:3], s33 offset:116 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v28, off, s[0:3], s33 offset:120 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v29, off, s[0:3], s33 offset:124 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v30, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:132 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:136 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:140 ; 4-byte Folded Reload
				; GFX906-NEXT: v_readlane_b32 s31, v41, 1
				; GFX906-NEXT: v_readlane_b32 s30, v41, 0
				; GFX906-NEXT: ; kill: killed $vgpr40
				; GFX906-NEXT: v_readlane_b32 s4, v41, 2
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[30:33] offset:112
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[26:29] offset:96
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[22:25] offset:80
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[18:21] offset:64
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[14:17] offset:48
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[10:13] offset:32
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[6:9] offset:16
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: flat_store_dwordx4 v[0:1], v[2:5]
				; GFX906-NEXT: s_waitcnt vmcnt(0)
				; GFX906-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; GFX906-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:144 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:152 ; 4-byte Folded Reload
				; GFX906-NEXT: s_mov_b64 exec, -1
				; GFX906-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
				; GFX906-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:148 ; 4-byte Folded Reload
				; GFX906-NEXT: s_mov_b64 exec, s[6:7]
				; GFX906-NEXT: s_addk_i32 s32, 0xd800
				; GFX906-NEXT: s_mov_b32 s33, s4
				; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX906-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX908-LABEL: preserve_wwm_copy_dstreg:
				; GFX908: ; %bb.0:
				; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX908-NEXT: s_mov_b32 s16, s33
				; GFX908-NEXT: s_mov_b32 s33, s32
				; GFX908-NEXT: s_xor_saveexec_b64 s[18:19], -1
				; GFX908-NEXT: buffer_store_dword v33, off, s[0:3], s33 offset:148 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:156 ; 4-byte Folded Spill
				; GFX908-NEXT: s_mov_b64 exec, -1
				; GFX908-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:152 ; 4-byte Folded Spill
				; GFX908-NEXT: s_mov_b64 exec, s[18:19]
				; GFX908-NEXT: v_mov_b32_e32 v3, s16
				; GFX908-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:160 ; 4-byte Folded Spill
				; GFX908-NEXT: s_addk_i32 s32, 0x2c00
				; GFX908-NEXT: s_mov_b64 s[16:17], exec
				; GFX908-NEXT: s_mov_b64 exec, 1
				; GFX908-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:164
				; GFX908-NEXT: v_writelane_b32 v2, s30, 0
				; GFX908-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:164
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: s_mov_b64 exec, s[16:17]
				; GFX908-NEXT: s_mov_b64 s[16:17], exec
				; GFX908-NEXT: s_mov_b64 exec, 1
				; GFX908-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:164
				; GFX908-NEXT: v_writelane_b32 v2, s31, 0
				; GFX908-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:164
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: s_mov_b64 exec, s[16:17]
				; GFX908-NEXT: s_mov_b32 s21, s15
				; GFX908-NEXT: ; implicit-def: $vgpr2
				; GFX908-NEXT: s_mov_b32 s22, s14
				; GFX908-NEXT: v_writelane_b32 v2, s21, 0
				; GFX908-NEXT: v_writelane_b32 v2, s22, 1
				; GFX908-NEXT: s_mov_b32 s23, s13
				; GFX908-NEXT: v_writelane_b32 v2, s23, 2
				; GFX908-NEXT: s_mov_b32 s24, s12
				; GFX908-NEXT: v_writelane_b32 v2, s24, 3
				; GFX908-NEXT: s_mov_b64 s[26:27], s[10:11]
				; GFX908-NEXT: v_writelane_b32 v2, s26, 4
				; GFX908-NEXT: v_writelane_b32 v2, s27, 5
				; GFX908-NEXT: v_writelane_b32 v2, s8, 6
				; GFX908-NEXT: v_writelane_b32 v2, s9, 7
				; GFX908-NEXT: v_writelane_b32 v2, s6, 8
				; GFX908-NEXT: v_writelane_b32 v2, s7, 9
				; GFX908-NEXT: v_writelane_b32 v2, s4, 10
				; GFX908-NEXT: v_mov_b32_e32 v32, v31
				; GFX908-NEXT: v_writelane_b32 v2, s5, 11
				; GFX908-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX908-NEXT: v_mov_b32_e32 v33, v2
				; GFX908-NEXT: s_mov_b64 exec, s[34:35]
				; GFX908-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def v[0:31]
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:64 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:68 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:72 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:76 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:80 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:84 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:88 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:92 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:96 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:100 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:104 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:108 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:112 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v24, off, s[0:3], s33 offset:116 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v25, off, s[0:3], s33 offset:120 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v26, off, s[0:3], s33 offset:124 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v27, off, s[0:3], s33 offset:128 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v28, off, s[0:3], s33 offset:132 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v29, off, s[0:3], s33 offset:136 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v30, off, s[0:3], s33 offset:140 ; 4-byte Folded Spill
				; GFX908-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:144 ; 4-byte Folded Spill
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def v40
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s11
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX908-NEXT: v_mov_b32_e32 v40, v33
				; GFX908-NEXT: s_mov_b64 exec, s[34:35]
				; GFX908-NEXT: v_writelane_b32 v40, s11, 12
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s12
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s12, 13
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s13
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s13, 14
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s14
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s14, 15
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s15
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s15, 16
				; GFX908-NEXT: s_getpc_b64 s[10:11]
				; GFX908-NEXT: s_add_u32 s10, s10, foo@gotpcrel32@lo+4
				; GFX908-NEXT: s_addc_u32 s11, s11, foo@gotpcrel32@hi+12
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s16
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s16, 17
				; GFX908-NEXT: s_load_dwordx2 s[10:11], s[10:11], 0x0
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s17
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s17, 18
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s18
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s18, 19
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s19
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s19, 20
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s20
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_writelane_b32 v40, s20, 21
				; GFX908-NEXT: s_waitcnt lgkmcnt(0)
				; GFX908-NEXT: v_writelane_b32 v40, s10, 22
				; GFX908-NEXT: v_writelane_b32 v40, s11, 23
				; GFX908-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX908-NEXT: s_mov_b64 exec, s[34:35]
				; GFX908-NEXT: v_readlane_b32 s16, v40, 22
				; GFX908-NEXT: s_mov_b32 s12, s24
				; GFX908-NEXT: s_mov_b32 s13, s23
				; GFX908-NEXT: s_mov_b32 s14, s22
				; GFX908-NEXT: v_mov_b32_e32 v31, v32
				; GFX908-NEXT: s_mov_b32 s15, s21
				; GFX908-NEXT: s_mov_b64 s[10:11], s[26:27]
				; GFX908-NEXT: v_readlane_b32 s17, v40, 23
				; GFX908-NEXT: buffer_store_dword v32, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GFX908-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GFX908-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX908-NEXT: s_mov_b64 exec, s[34:35]
				; GFX908-NEXT: v_readlane_b32 s11, v40, 12
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s11
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s12, v40, 13
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s12
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s13, v40, 14
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s13
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s14, v40, 15
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s14
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s15, v40, 16
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s15
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s16, v40, 17
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s16
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s17, v40, 18
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s17
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s18, v40, 19
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s18
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s19, v40, 20
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s19
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s20, v40, 21
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s20
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s21
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s22
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s23
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s24
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s25
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s26
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s27
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s28
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; def s29
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX908-NEXT: v_writelane_b32 v40, s21, 24
				; GFX908-NEXT: v_writelane_b32 v40, s22, 25
				; GFX908-NEXT: v_writelane_b32 v40, s23, 26
				; GFX908-NEXT: v_writelane_b32 v40, s24, 27
				; GFX908-NEXT: v_writelane_b32 v40, s25, 28
				; GFX908-NEXT: v_writelane_b32 v40, s26, 29
				; GFX908-NEXT: v_writelane_b32 v40, s27, 30
				; GFX908-NEXT: v_writelane_b32 v40, s28, 31
				; GFX908-NEXT: v_writelane_b32 v40, s29, 32
				; GFX908-NEXT: v_readlane_b32 s4, v40, 10
				; GFX908-NEXT: v_readlane_b32 s6, v40, 8
				; GFX908-NEXT: v_readlane_b32 s8, v40, 6
				; GFX908-NEXT: v_readlane_b32 s10, v40, 4
				; GFX908-NEXT: v_readlane_b32 s16, v40, 22
				; GFX908-NEXT: v_readlane_b32 s12, v40, 3
				; GFX908-NEXT: v_readlane_b32 s13, v40, 2
				; GFX908-NEXT: v_readlane_b32 s14, v40, 1
				; GFX908-NEXT: v_readlane_b32 s15, v40, 0
				; GFX908-NEXT: v_readlane_b32 s5, v40, 11
				; GFX908-NEXT: v_readlane_b32 s7, v40, 9
				; GFX908-NEXT: v_readlane_b32 s9, v40, 7
				; GFX908-NEXT: v_readlane_b32 s11, v40, 5
				; GFX908-NEXT: v_readlane_b32 s17, v40, 23
				; GFX908-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX908-NEXT: s_mov_b64 exec, s[34:35]
				; GFX908-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GFX908-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX908-NEXT: s_mov_b64 exec, s[34:35]
				; GFX908-NEXT: v_readlane_b32 s21, v40, 24
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s21
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s22, v40, 25
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s22
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s23, v40, 26
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s23
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s24, v40, 27
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s24
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s25, v40, 28
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s25
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s26, v40, 29
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s26
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s27, v40, 30
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s27
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s28, v40, 31
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s28
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: v_readlane_b32 s29, v40, 32
				; GFX908-NEXT: ;;#ASMSTART
				; GFX908-NEXT: ; use s29
				; GFX908-NEXT: ;;#ASMEND
				; GFX908-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX908-NEXT: v_readlane_b32 s4, v40, 10
				; GFX908-NEXT: v_readlane_b32 s6, v40, 8
				; GFX908-NEXT: v_readlane_b32 s8, v40, 6
				; GFX908-NEXT: v_readlane_b32 s10, v40, 4
				; GFX908-NEXT: v_readlane_b32 s16, v40, 22
				; GFX908-NEXT: v_readlane_b32 s5, v40, 11
				; GFX908-NEXT: v_readlane_b32 s7, v40, 9
				; GFX908-NEXT: v_readlane_b32 s9, v40, 7
				; GFX908-NEXT: v_readlane_b32 s11, v40, 5
				; GFX908-NEXT: v_readlane_b32 s12, v40, 3
				; GFX908-NEXT: v_readlane_b32 s13, v40, 2
				; GFX908-NEXT: v_readlane_b32 s14, v40, 1
				; GFX908-NEXT: v_readlane_b32 s15, v40, 0
				; GFX908-NEXT: v_readlane_b32 s17, v40, 23
				; GFX908-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GFX908-NEXT: s_mov_b64 exec, s[34:35]
				; GFX908-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:64 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:68 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:72 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:76 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:80 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:84 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:88 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:92 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:96 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:100 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:104 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v24, off, s[0:3], s33 offset:108 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v25, off, s[0:3], s33 offset:112 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v26, off, s[0:3], s33 offset:116 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v27, off, s[0:3], s33 offset:120 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v28, off, s[0:3], s33 offset:124 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v29, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v30, off, s[0:3], s33 offset:132 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:136 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:140 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:144 ; 4-byte Folded Reload
				; GFX908-NEXT: s_mov_b64 s[4:5], exec
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[30:33] offset:112
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[26:29] offset:96
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[22:25] offset:80
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[18:21] offset:64
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[14:17] offset:48
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[10:13] offset:32
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[6:9] offset:16
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: flat_store_dwordx4 v[0:1], v[2:5]
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: s_mov_b64 exec, 1
				; GFX908-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:164
				; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: v_readlane_b32 s31, v0, 0
				; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:164
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: s_mov_b64 exec, s[4:5]
				; GFX908-NEXT: s_mov_b64 s[4:5], exec
				; GFX908-NEXT: s_mov_b64 exec, 1
				; GFX908-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:164
				; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: v_readlane_b32 s30, v0, 0
				; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:164
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: s_mov_b64 exec, s[4:5]
				; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:160 ; 4-byte Folded Reload
				; GFX908-NEXT: ; kill: killed $vgpr40
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: v_readfirstlane_b32 s4, v0
				; GFX908-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; GFX908-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:148 ; 4-byte Folded Reload
				; GFX908-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:156 ; 4-byte Folded Reload
				; GFX908-NEXT: s_mov_b64 exec, -1
				; GFX908-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:152 ; 4-byte Folded Reload
				; GFX908-NEXT: s_mov_b64 exec, s[6:7]
				; GFX908-NEXT: s_addk_i32 s32, 0xd400
				; GFX908-NEXT: s_mov_b32 s33, s4
				; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX908-NEXT: s_setpc_b64 s[30:31]
				%vreg0 = call <32 x float> asm sideeffect "; def $0", "=v"()
				%v40 = call i32 asm sideeffect "; def $0","=${v40}"()

				%s11 = call i32 asm sideeffect "; def $0","=${s11}"()
				%s12 = call i32 asm sideeffect "; def $0","=${s12}"()
				%s13 = call i32 asm sideeffect "; def $0","=${s13}"()
				%s14 = call i32 asm sideeffect "; def $0","=${s14}"()
				%s15 = call i32 asm sideeffect "; def $0","=${s15}"()
				%s16 = call i32 asm sideeffect "; def $0","=${s16}"()
				%s17 = call i32 asm sideeffect "; def $0","=${s17}"()
				%s18 = call i32 asm sideeffect "; def $0","=${s18}"()
				%s19 = call i32 asm sideeffect "; def $0","=${s19}"()
				%s20 = call i32 asm sideeffect "; def $0","=${s20}"()
				call void @foo()
				call void asm sideeffect "; use $0","${s11}"(i32 %s11)
				call void asm sideeffect "; use $0","${s12}"(i32 %s12)
				call void asm sideeffect "; use $0","${s13}"(i32 %s13)
				call void asm sideeffect "; use $0","${s14}"(i32 %s14)
				call void asm sideeffect "; use $0","${s15}"(i32 %s15)
				call void asm sideeffect "; use $0","${s16}"(i32 %s16)
				call void asm sideeffect "; use $0","${s17}"(i32 %s17)
				call void asm sideeffect "; use $0","${s18}"(i32 %s18)
				call void asm sideeffect "; use $0","${s19}"(i32 %s19)
				call void asm sideeffect "; use $0","${s20}"(i32 %s20)

				%s21 = call i32 asm sideeffect "; def $0","=${s21}"()
				%s22 = call i32 asm sideeffect "; def $0","=${s22}"()
				%s23 = call i32 asm sideeffect "; def $0","=${s23}"()
				%s24 = call i32 asm sideeffect "; def $0","=${s24}"()
				%s25 = call i32 asm sideeffect "; def $0","=${s25}"()
				%s26 = call i32 asm sideeffect "; def $0","=${s26}"()
				%s27 = call i32 asm sideeffect "; def $0","=${s27}"()
				%s28 = call i32 asm sideeffect "; def $0","=${s28}"()
				%s29 = call i32 asm sideeffect "; def $0","=${s29}"()
				call void @foo()
				call void asm sideeffect "; use $0","${s21}"(i32 %s21)
				call void asm sideeffect "; use $0","${s22}"(i32 %s22)
				call void asm sideeffect "; use $0","${s23}"(i32 %s23)
				call void asm sideeffect "; use $0","${s24}"(i32 %s24)
				call void asm sideeffect "; use $0","${s25}"(i32 %s25)
				call void asm sideeffect "; use $0","${s26}"(i32 %s26)
				call void asm sideeffect "; use $0","${s27}"(i32 %s27)
				call void asm sideeffect "; use $0","${s28}"(i32 %s28)
				call void asm sideeffect "; use $0","${s29}"(i32 %s29)

				call void @foo()

				store volatile <32 x float> %vreg0, ptr %parg0

				ret void
				}

				declare void @foo()

				attributes #0 = { "amdgpu-num-vgpr"="42" "amdgpu-num-sgpr"="40"}

llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll

	; RUN: not --crash llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs -o /dev/null %s 2>&1 \| FileCheck %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s

				; This was a negative test to catch an extreme case when all options are exhausted
				; while trying to spill SGPRs to memory. After we enabled SGPR spills into virtual VGPRs
				; the edge case won't arise and the test would always compile.

	; This ends up needing to spill SGPRs to memory, and also does not
	; have any free SGPRs available to save the exec mask when doing so.
	; The register scavenger also needs to use the emergency stack slot,
	; which tries to place the scavenged register restore instruction as
	; far the block as possible, near the terminator. This places a
	; restore instruction between the condition and the conditional
	; branch, which gets expanded into a sequence involving s_not_b64 on
	; the exec mask, clobbering SCC value before the branch. We probably
	; have to stop relying on being able to flip and restore the exec
	; mask, and always require a free SGPR for saving exec.

	; CHECK: * Bad machine code: Using an undefined physical register *
	; CHECK-NEXT: - function: kernel0
	; CHECK-NEXT: - basic block: %bb.0
	; CHECK-NEXT: - instruction: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
	; CHECK-NEXT: - operand 1: implicit killed $scc
	define amdgpu_kernel void @kernel0(ptr addrspace(1) %out, i32 %in) #1 {			define amdgpu_kernel void @kernel0(ptr addrspace(1) %out, i32 %in) #1 {
				; CHECK-LABEL: kernel0:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: ; implicit-def: $vgpr23
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[2:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s2, 0
				; CHECK-NEXT: s_load_dword s0, s[4:5], 0x8
				; CHECK-NEXT: v_writelane_b32 v23, s3, 1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s4, 2
				; CHECK-NEXT: v_writelane_b32 v23, s5, 3
				; CHECK-NEXT: v_writelane_b32 v23, s6, 4
				; CHECK-NEXT: v_writelane_b32 v23, s7, 5
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:11]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s4, 6
				; CHECK-NEXT: v_writelane_b32 v23, s5, 7
				; CHECK-NEXT: v_writelane_b32 v23, s6, 8
				; CHECK-NEXT: v_writelane_b32 v23, s7, 9
				; CHECK-NEXT: v_writelane_b32 v23, s8, 10
				; CHECK-NEXT: v_writelane_b32 v23, s9, 11
				; CHECK-NEXT: v_writelane_b32 v23, s10, 12
				; CHECK-NEXT: v_writelane_b32 v23, s11, 13
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:19]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s4, 14
				; CHECK-NEXT: v_writelane_b32 v23, s5, 15
				; CHECK-NEXT: v_writelane_b32 v23, s6, 16
				; CHECK-NEXT: v_writelane_b32 v23, s7, 17
				; CHECK-NEXT: v_writelane_b32 v23, s8, 18
				; CHECK-NEXT: v_writelane_b32 v23, s9, 19
				; CHECK-NEXT: v_writelane_b32 v23, s10, 20
				; CHECK-NEXT: v_writelane_b32 v23, s11, 21
				; CHECK-NEXT: v_writelane_b32 v23, s12, 22
				; CHECK-NEXT: v_writelane_b32 v23, s13, 23
				; CHECK-NEXT: v_writelane_b32 v23, s14, 24
				; CHECK-NEXT: v_writelane_b32 v23, s15, 25
				; CHECK-NEXT: v_writelane_b32 v23, s16, 26
				; CHECK-NEXT: v_writelane_b32 v23, s17, 27
				; CHECK-NEXT: v_writelane_b32 v23, s18, 28
				; CHECK-NEXT: v_writelane_b32 v23, s19, 29
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[2:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s2, 30
				; CHECK-NEXT: v_writelane_b32 v23, s3, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s4, 32
				; CHECK-NEXT: v_writelane_b32 v23, s5, 33
				; CHECK-NEXT: v_writelane_b32 v23, s6, 34
				; CHECK-NEXT: v_writelane_b32 v23, s7, 35
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:11]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s4, 36
				; CHECK-NEXT: v_writelane_b32 v23, s5, 37
				; CHECK-NEXT: v_writelane_b32 v23, s6, 38
				; CHECK-NEXT: v_writelane_b32 v23, s7, 39
				; CHECK-NEXT: v_writelane_b32 v23, s8, 40
				; CHECK-NEXT: v_writelane_b32 v23, s9, 41
				; CHECK-NEXT: v_writelane_b32 v23, s10, 42
				; CHECK-NEXT: v_writelane_b32 v23, s11, 43
				; CHECK-NEXT: s_waitcnt lgkmcnt(0)
				; CHECK-NEXT: s_cmp_lg_u32 s0, 0
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[16:31]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[52:53]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[48:51]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[36:43]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s0, 44
				; CHECK-NEXT: v_writelane_b32 v23, s1, 45
				; CHECK-NEXT: v_writelane_b32 v23, s2, 46
				; CHECK-NEXT: v_writelane_b32 v23, s3, 47
				; CHECK-NEXT: v_writelane_b32 v23, s4, 48
				; CHECK-NEXT: v_writelane_b32 v23, s5, 49
				; CHECK-NEXT: v_writelane_b32 v23, s6, 50
				; CHECK-NEXT: v_writelane_b32 v23, s7, 51
				; CHECK-NEXT: v_writelane_b32 v23, s8, 52
				; CHECK-NEXT: v_writelane_b32 v23, s9, 53
				; CHECK-NEXT: v_writelane_b32 v23, s10, 54
				; CHECK-NEXT: v_writelane_b32 v23, s11, 55
				; CHECK-NEXT: v_writelane_b32 v23, s12, 56
				; CHECK-NEXT: v_writelane_b32 v23, s13, 57
				; CHECK-NEXT: v_writelane_b32 v23, s14, 58
				; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: v_writelane_b32 v23, s15, 59
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[34:35]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[44:47]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s0, 60
				; CHECK-NEXT: v_writelane_b32 v0, s4, 0
				; CHECK-NEXT: v_writelane_b32 v23, s1, 61
				; CHECK-NEXT: v_writelane_b32 v0, s5, 1
				; CHECK-NEXT: v_writelane_b32 v23, s2, 62
				; CHECK-NEXT: v_writelane_b32 v0, s6, 2
				; CHECK-NEXT: v_writelane_b32 v23, s3, 63
				; CHECK-NEXT: v_writelane_b32 v0, s7, 3
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s0, 4
				; CHECK-NEXT: v_writelane_b32 v0, s1, 5
				; CHECK-NEXT: v_writelane_b32 v0, s2, 6
				; CHECK-NEXT: v_writelane_b32 v0, s3, 7
				; CHECK-NEXT: v_writelane_b32 v0, s4, 8
				; CHECK-NEXT: v_writelane_b32 v0, s5, 9
				; CHECK-NEXT: v_writelane_b32 v0, s6, 10
				; CHECK-NEXT: v_writelane_b32 v0, s7, 11
				; CHECK-NEXT: v_writelane_b32 v0, s8, 12
				; CHECK-NEXT: v_writelane_b32 v0, s9, 13
				; CHECK-NEXT: v_writelane_b32 v0, s10, 14
				; CHECK-NEXT: v_writelane_b32 v0, s11, 15
				; CHECK-NEXT: v_writelane_b32 v0, s12, 16
				; CHECK-NEXT: v_writelane_b32 v0, s13, 17
				; CHECK-NEXT: v_writelane_b32 v0, s14, 18
				; CHECK-NEXT: v_writelane_b32 v0, s15, 19
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[54:55]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s0, 20
				; CHECK-NEXT: v_writelane_b32 v0, s1, 21
				; CHECK-NEXT: v_writelane_b32 v0, s2, 22
				; CHECK-NEXT: v_writelane_b32 v0, s3, 23
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s0, 24
				; CHECK-NEXT: v_writelane_b32 v0, s1, 25
				; CHECK-NEXT: v_writelane_b32 v0, s2, 26
				; CHECK-NEXT: v_writelane_b32 v0, s3, 27
				; CHECK-NEXT: v_writelane_b32 v0, s4, 28
				; CHECK-NEXT: v_writelane_b32 v0, s5, 29
				; CHECK-NEXT: v_writelane_b32 v0, s6, 30
				; CHECK-NEXT: v_writelane_b32 v0, s7, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s0, 32
				; CHECK-NEXT: v_writelane_b32 v0, s1, 33
				; CHECK-NEXT: v_writelane_b32 v0, s2, 34
				; CHECK-NEXT: v_writelane_b32 v0, s3, 35
				; CHECK-NEXT: v_writelane_b32 v0, s4, 36
				; CHECK-NEXT: v_writelane_b32 v0, s5, 37
				; CHECK-NEXT: v_writelane_b32 v0, s6, 38
				; CHECK-NEXT: v_writelane_b32 v0, s7, 39
				; CHECK-NEXT: v_writelane_b32 v0, s8, 40
				; CHECK-NEXT: v_writelane_b32 v0, s9, 41
				; CHECK-NEXT: v_writelane_b32 v0, s10, 42
				; CHECK-NEXT: v_writelane_b32 v0, s11, 43
				; CHECK-NEXT: v_writelane_b32 v0, s12, 44
				; CHECK-NEXT: v_writelane_b32 v0, s13, 45
				; CHECK-NEXT: v_writelane_b32 v0, s14, 46
				; CHECK-NEXT: v_writelane_b32 v0, s15, 47
				; CHECK-NEXT: s_cbranch_scc0 .LBB0_2
				; CHECK-NEXT: ; %bb.1: ; %ret
				; CHECK-NEXT: ; kill: killed $vgpr23
				; CHECK-NEXT: ; kill: killed $vgpr0
				; CHECK-NEXT: s_endpgm
				; CHECK-NEXT: .LBB0_2: ; %bb0
				; CHECK-NEXT: v_readlane_b32 s0, v23, 0
				; CHECK-NEXT: v_readlane_b32 s1, v23, 1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:1]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 2
				; CHECK-NEXT: v_readlane_b32 s1, v23, 3
				; CHECK-NEXT: v_readlane_b32 s2, v23, 4
				; CHECK-NEXT: v_readlane_b32 s3, v23, 5
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 6
				; CHECK-NEXT: v_readlane_b32 s1, v23, 7
				; CHECK-NEXT: v_readlane_b32 s2, v23, 8
				; CHECK-NEXT: v_readlane_b32 s3, v23, 9
				; CHECK-NEXT: v_readlane_b32 s4, v23, 10
				; CHECK-NEXT: v_readlane_b32 s5, v23, 11
				; CHECK-NEXT: v_readlane_b32 s6, v23, 12
				; CHECK-NEXT: v_readlane_b32 s7, v23, 13
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 14
				; CHECK-NEXT: v_readlane_b32 s1, v23, 15
				; CHECK-NEXT: v_readlane_b32 s2, v23, 16
				; CHECK-NEXT: v_readlane_b32 s3, v23, 17
				; CHECK-NEXT: v_readlane_b32 s4, v23, 18
				; CHECK-NEXT: v_readlane_b32 s5, v23, 19
				; CHECK-NEXT: v_readlane_b32 s6, v23, 20
				; CHECK-NEXT: v_readlane_b32 s7, v23, 21
				; CHECK-NEXT: v_readlane_b32 s8, v23, 22
				; CHECK-NEXT: v_readlane_b32 s9, v23, 23
				; CHECK-NEXT: v_readlane_b32 s10, v23, 24
				; CHECK-NEXT: v_readlane_b32 s11, v23, 25
				; CHECK-NEXT: v_readlane_b32 s12, v23, 26
				; CHECK-NEXT: v_readlane_b32 s13, v23, 27
				; CHECK-NEXT: v_readlane_b32 s14, v23, 28
				; CHECK-NEXT: v_readlane_b32 s15, v23, 29
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 30
				; CHECK-NEXT: v_readlane_b32 s1, v23, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:1]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 32
				; CHECK-NEXT: v_readlane_b32 s1, v23, 33
				; CHECK-NEXT: v_readlane_b32 s2, v23, 34
				; CHECK-NEXT: v_readlane_b32 s3, v23, 35
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 36
				; CHECK-NEXT: v_readlane_b32 s1, v23, 37
				; CHECK-NEXT: v_readlane_b32 s2, v23, 38
				; CHECK-NEXT: v_readlane_b32 s3, v23, 39
				; CHECK-NEXT: v_readlane_b32 s4, v23, 40
				; CHECK-NEXT: v_readlane_b32 s5, v23, 41
				; CHECK-NEXT: v_readlane_b32 s6, v23, 42
				; CHECK-NEXT: v_readlane_b32 s7, v23, 43
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 44
				; CHECK-NEXT: v_readlane_b32 s1, v23, 45
				; CHECK-NEXT: v_readlane_b32 s2, v23, 46
				; CHECK-NEXT: v_readlane_b32 s3, v23, 47
				; CHECK-NEXT: v_readlane_b32 s4, v23, 48
				; CHECK-NEXT: v_readlane_b32 s5, v23, 49
				; CHECK-NEXT: v_readlane_b32 s6, v23, 50
				; CHECK-NEXT: v_readlane_b32 s7, v23, 51
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[16:31]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[52:53]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[48:51]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[36:43]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s8, v23, 52
				; CHECK-NEXT: v_readlane_b32 s9, v23, 53
				; CHECK-NEXT: v_readlane_b32 s10, v23, 54
				; CHECK-NEXT: v_readlane_b32 s11, v23, 55
				; CHECK-NEXT: v_readlane_b32 s12, v23, 56
				; CHECK-NEXT: v_readlane_b32 s13, v23, 57
				; CHECK-NEXT: v_readlane_b32 s14, v23, 58
				; CHECK-NEXT: v_readlane_b32 s15, v23, 59
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 60
				; CHECK-NEXT: v_readlane_b32 s1, v23, 61
				; CHECK-NEXT: v_readlane_b32 s2, v23, 62
				; CHECK-NEXT: v_readlane_b32 s3, v23, 63
				; CHECK-NEXT: v_readlane_b32 s4, v0, 0
				; CHECK-NEXT: v_readlane_b32 s5, v0, 1
				; CHECK-NEXT: v_readlane_b32 s6, v0, 2
				; CHECK-NEXT: v_readlane_b32 s7, v0, 3
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[34:35]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[44:47]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 4
				; CHECK-NEXT: v_readlane_b32 s1, v0, 5
				; CHECK-NEXT: v_readlane_b32 s2, v0, 6
				; CHECK-NEXT: v_readlane_b32 s3, v0, 7
				; CHECK-NEXT: v_readlane_b32 s4, v0, 8
				; CHECK-NEXT: v_readlane_b32 s5, v0, 9
				; CHECK-NEXT: v_readlane_b32 s6, v0, 10
				; CHECK-NEXT: v_readlane_b32 s7, v0, 11
				; CHECK-NEXT: v_readlane_b32 s8, v0, 12
				; CHECK-NEXT: v_readlane_b32 s9, v0, 13
				; CHECK-NEXT: v_readlane_b32 s10, v0, 14
				; CHECK-NEXT: v_readlane_b32 s11, v0, 15
				; CHECK-NEXT: v_readlane_b32 s12, v0, 16
				; CHECK-NEXT: v_readlane_b32 s13, v0, 17
				; CHECK-NEXT: v_readlane_b32 s14, v0, 18
				; CHECK-NEXT: v_readlane_b32 s15, v0, 19
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 20
				; CHECK-NEXT: v_readlane_b32 s1, v0, 21
				; CHECK-NEXT: v_readlane_b32 s2, v0, 22
				; CHECK-NEXT: v_readlane_b32 s3, v0, 23
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[54:55]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 24
				; CHECK-NEXT: v_readlane_b32 s1, v0, 25
				; CHECK-NEXT: v_readlane_b32 s2, v0, 26
				; CHECK-NEXT: v_readlane_b32 s3, v0, 27
				; CHECK-NEXT: v_readlane_b32 s4, v0, 28
				; CHECK-NEXT: v_readlane_b32 s5, v0, 29
				; CHECK-NEXT: v_readlane_b32 s6, v0, 30
				; CHECK-NEXT: v_readlane_b32 s7, v0, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 32
				; CHECK-NEXT: v_readlane_b32 s1, v0, 33
				; CHECK-NEXT: v_readlane_b32 s2, v0, 34
				; CHECK-NEXT: v_readlane_b32 s3, v0, 35
				; CHECK-NEXT: v_readlane_b32 s4, v0, 36
				; CHECK-NEXT: v_readlane_b32 s5, v0, 37
				; CHECK-NEXT: v_readlane_b32 s6, v0, 38
				; CHECK-NEXT: v_readlane_b32 s7, v0, 39
				; CHECK-NEXT: v_readlane_b32 s8, v0, 40
				; CHECK-NEXT: v_readlane_b32 s9, v0, 41
				; CHECK-NEXT: v_readlane_b32 s10, v0, 42
				; CHECK-NEXT: v_readlane_b32 s11, v0, 43
				; CHECK-NEXT: v_readlane_b32 s12, v0, 44
				; CHECK-NEXT: v_readlane_b32 s13, v0, 45
				; CHECK-NEXT: v_readlane_b32 s14, v0, 46
				; CHECK-NEXT: v_readlane_b32 s15, v0, 47
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ; kill: killed $vgpr23
				; CHECK-NEXT: ; kill: killed $vgpr0
				; CHECK-NEXT: s_endpgm
	call void asm sideeffect "", "~{v[0:7]}" () #0			call void asm sideeffect "", "~{v[0:7]}" () #0
	call void asm sideeffect "", "~{v[8:15]}" () #0			call void asm sideeffect "", "~{v[8:15]}" () #0
	call void asm sideeffect "", "~{v[16:19]}"() #0			call void asm sideeffect "", "~{v[16:19]}"() #0
	call void asm sideeffect "", "~{v[20:21]}"() #0			call void asm sideeffect "", "~{v[20:21]}"() #0
	call void asm sideeffect "", "~{v22}"() #0			call void asm sideeffect "", "~{v22}"() #0

	%val0 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0			%val0 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0
	%val1 = call <4 x i32> asm sideeffect "; def $0", "=s" () #0			%val1 = call <4 x i32> asm sideeffect "; def $0", "=s" () #0
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-dead-frame-in-dbg-value.mir

# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck -check-prefix=SGPR_SPILL %s
		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs --start-before=si-lower-sgpr-spills --stop-after=prologepilog -o - %s \| FileCheck -check-prefix=PEI %s

# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, replace the dead frame index in the DBG_VALUE instruction with reg 0.		# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, replace the dead frame index in the DBG_VALUE instruction with reg 0.
# Otherwise, the test would crash during PEI while trying to replace the dead frame index.		# Otherwise, the test would crash during PEI while trying to replace the dead frame index.
--- \|		--- \|
define amdgpu_kernel void @test() { ret void }		define amdgpu_kernel void @test() { ret void }

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !4, producer: "llvm", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !4)		!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !4, producer: "llvm", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !4)
!1 = !DILocalVariable(name: "a", scope: !2, file: !4, line: 126, type: !6)		!1 = !DILocalVariable(name: "a", scope: !2, file: !4, line: 126, type: !6)
Show All 24 Lines	machineFunctionInfo:
hasSpilledSGPRs: true		hasSpilledSGPRs: true
argumentInfo:		argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr4_sgpr5' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		workGroupIDX: { reg: '$sgpr8' }
privateSegmentWaveByteOffset: { reg: '$sgpr9' }		privateSegmentWaveByteOffset: { reg: '$sgpr9' }
body: \|		body: \|
; CHECK-LABEL: name: test		; SGPR_SPILL-LABEL: name: test
; CHECK: bb.0:		; SGPR_SPILL: bb.0:
; CHECK: $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 0, $vgpr0		; SGPR_SPILL-NEXT: successors: %bb.1(0x80000000)
; CHECK: DBG_VALUE $noreg, 0		; SGPR_SPILL-NEXT: {{ $}}
; CHECK: bb.1:		; SGPR_SPILL-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK: $sgpr10 = V_READLANE_B32 $vgpr0, 0		; SGPR_SPILL-NEXT: renamable $sgpr10 = IMPLICIT_DEF
; CHECK: S_ENDPGM 0		; SGPR_SPILL-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
		; SGPR_SPILL-NEXT: DBG_VALUE $noreg, 0
		; SGPR_SPILL-NEXT: {{ $}}
		; SGPR_SPILL-NEXT: bb.1:
		; SGPR_SPILL-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_]], 0
		; SGPR_SPILL-NEXT: KILL [[V_WRITELANE_B32_]]
		; SGPR_SPILL-NEXT: S_ENDPGM 0
		; PEI-LABEL: name: test
		; PEI: bb.0:
		; PEI-NEXT: successors: %bb.1(0x80000000)
		; PEI-NEXT: {{ $}}
		; PEI-NEXT: renamable $vgpr0 = IMPLICIT_DEF
		; PEI-NEXT: renamable $sgpr10 = IMPLICIT_DEF
		; PEI-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 0, killed $vgpr0
		; PEI-NEXT: {{ $}}
		; PEI-NEXT: bb.1:
		; PEI-NEXT: liveins: $vgpr0
		; PEI-NEXT: {{ $}}
		; PEI-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 0
		; PEI-NEXT: KILL killed renamable $vgpr0
		; PEI-NEXT: S_ENDPGM 0
bb.0:		bb.0:
renamable $sgpr10 = IMPLICIT_DEF		renamable $sgpr10 = IMPLICIT_DEF
SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
DBG_VALUE %stack.0, 0, !1, !8, debug-location !9		DBG_VALUE %stack.0, 0, !1, !8, debug-location !9

bb.1:		bb.1:
renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
S_ENDPGM 0		S_ENDPGM 0

llvm/test/CodeGen/AMDGPU/sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir

# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck %s

# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, we replace the dead frame index in the DBG_VALUE instruction with reg 0.		# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, we replace the dead frame index in the DBG_VALUE instruction with reg 0.
# Skip looking for frame indices in the debug value instruction for incoming arguments passed via stack. The test would crash otherwise.		# Skip looking for frame indices in the debug value instruction for incoming arguments passed via stack. The test would crash otherwise.
# It is safe to skip the fixed stack objects as they will never become the spill objects.		# It is safe to skip the fixed stack objects as they will never become the spill objects.

--- \|		--- \|
define amdgpu_kernel void @test() { ret void }		define amdgpu_kernel void @test() { ret void }

Show All 30 Lines	argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr4_sgpr5' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		workGroupIDX: { reg: '$sgpr8' }
privateSegmentWaveByteOffset: { reg: '$sgpr9' }		privateSegmentWaveByteOffset: { reg: '$sgpr9' }
body: \|		body: \|
; CHECK-LABEL: name: test		; CHECK-LABEL: name: test
; CHECK: bb.0:		; CHECK: bb.0:
; CHECK: DBG_VALUE $noreg, 0		; CHECK: DBG_VALUE
bb.0:		bb.0:
renamable $sgpr10 = IMPLICIT_DEF		renamable $sgpr10 = IMPLICIT_DEF
SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
DBG_VALUE %fixed-stack.0, 0, !1, !8, debug-location !9		DBG_VALUE %fixed-stack.0, 0, !1, !8, debug-location !9

bb.1:		bb.1:
renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
S_ENDPGM 0		S_ENDPGM 0

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; The first 64 SGPR spills can go to a VGPR, but there isn't a second			; The first 64 SGPR spills can go to a VGPR, but there isn't a second
	; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.			; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.

	define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(ptr addrspace(1) %out, i32 %in) #1 {			define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(ptr addrspace(1) %out, i32 %in) #1 {
	; GCN-LABEL: partial_no_vgprs_last_sgpr_spill:			; GCN-LABEL: partial_no_vgprs_last_sgpr_spill:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_add_u32 s0, s0, s15			; GCN-NEXT: s_add_u32 s0, s0, s15
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
				; GCN-NEXT: ; implicit-def: $vgpr0
				; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: s_load_dword s4, s[8:9], 0x2			; GCN-NEXT: s_load_dword s4, s[8:9], 0x2
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
				; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:8 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[24:25]
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 0			; GCN-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: v_writelane_b32 v23, s9, 1			; GCN-NEXT: v_writelane_b32 v1, s8, 0
	; GCN-NEXT: v_writelane_b32 v23, s10, 2			; GCN-NEXT: v_writelane_b32 v1, s9, 1
	; GCN-NEXT: v_writelane_b32 v23, s11, 3			; GCN-NEXT: v_writelane_b32 v1, s10, 2
	; GCN-NEXT: v_writelane_b32 v23, s12, 4			; GCN-NEXT: v_writelane_b32 v1, s11, 3
	; GCN-NEXT: v_writelane_b32 v23, s13, 5			; GCN-NEXT: v_writelane_b32 v1, s12, 4
	; GCN-NEXT: v_writelane_b32 v23, s14, 6			; GCN-NEXT: v_writelane_b32 v1, s13, 5
	; GCN-NEXT: v_writelane_b32 v23, s15, 7			; GCN-NEXT: v_writelane_b32 v1, s14, 6
	; GCN-NEXT: v_writelane_b32 v23, s16, 8			; GCN-NEXT: v_writelane_b32 v1, s15, 7
	; GCN-NEXT: v_writelane_b32 v23, s17, 9			; GCN-NEXT: v_writelane_b32 v1, s16, 8
	; GCN-NEXT: v_writelane_b32 v23, s18, 10			; GCN-NEXT: v_writelane_b32 v1, s17, 9
	; GCN-NEXT: v_writelane_b32 v23, s19, 11			; GCN-NEXT: v_writelane_b32 v1, s18, 10
	; GCN-NEXT: v_writelane_b32 v23, s20, 12			; GCN-NEXT: v_writelane_b32 v1, s19, 11
	; GCN-NEXT: v_writelane_b32 v23, s21, 13			; GCN-NEXT: v_writelane_b32 v1, s20, 12
	; GCN-NEXT: v_writelane_b32 v23, s22, 14			; GCN-NEXT: v_writelane_b32 v1, s21, 13
	; GCN-NEXT: v_writelane_b32 v23, s23, 15			; GCN-NEXT: v_writelane_b32 v1, s22, 14
				; GCN-NEXT: v_writelane_b32 v1, s23, 15
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 16			; GCN-NEXT: v_writelane_b32 v1, s8, 16
	; GCN-NEXT: v_writelane_b32 v23, s9, 17			; GCN-NEXT: v_writelane_b32 v1, s9, 17
	; GCN-NEXT: v_writelane_b32 v23, s10, 18			; GCN-NEXT: v_writelane_b32 v1, s10, 18
	; GCN-NEXT: v_writelane_b32 v23, s11, 19			; GCN-NEXT: v_writelane_b32 v1, s11, 19
	; GCN-NEXT: v_writelane_b32 v23, s12, 20			; GCN-NEXT: v_writelane_b32 v1, s12, 20
	; GCN-NEXT: v_writelane_b32 v23, s13, 21			; GCN-NEXT: v_writelane_b32 v1, s13, 21
	; GCN-NEXT: v_writelane_b32 v23, s14, 22			; GCN-NEXT: v_writelane_b32 v1, s14, 22
	; GCN-NEXT: v_writelane_b32 v23, s15, 23			; GCN-NEXT: v_writelane_b32 v1, s15, 23
	; GCN-NEXT: v_writelane_b32 v23, s16, 24			; GCN-NEXT: v_writelane_b32 v1, s16, 24
	; GCN-NEXT: v_writelane_b32 v23, s17, 25			; GCN-NEXT: v_writelane_b32 v1, s17, 25
	; GCN-NEXT: v_writelane_b32 v23, s18, 26			; GCN-NEXT: v_writelane_b32 v1, s18, 26
	; GCN-NEXT: v_writelane_b32 v23, s19, 27			; GCN-NEXT: v_writelane_b32 v1, s19, 27
	; GCN-NEXT: v_writelane_b32 v23, s20, 28			; GCN-NEXT: v_writelane_b32 v1, s20, 28
	; GCN-NEXT: v_writelane_b32 v23, s21, 29			; GCN-NEXT: v_writelane_b32 v1, s21, 29
	; GCN-NEXT: v_writelane_b32 v23, s22, 30			; GCN-NEXT: v_writelane_b32 v1, s22, 30
	; GCN-NEXT: v_writelane_b32 v23, s23, 31			; GCN-NEXT: v_writelane_b32 v1, s23, 31
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 32			; GCN-NEXT: v_writelane_b32 v1, s8, 32
	; GCN-NEXT: v_writelane_b32 v23, s9, 33			; GCN-NEXT: v_writelane_b32 v1, s9, 33
	; GCN-NEXT: v_writelane_b32 v23, s10, 34			; GCN-NEXT: v_writelane_b32 v1, s10, 34
	; GCN-NEXT: v_writelane_b32 v23, s11, 35			; GCN-NEXT: v_writelane_b32 v1, s11, 35
	; GCN-NEXT: v_writelane_b32 v23, s12, 36			; GCN-NEXT: v_writelane_b32 v1, s12, 36
	; GCN-NEXT: v_writelane_b32 v23, s13, 37			; GCN-NEXT: v_writelane_b32 v1, s13, 37
	; GCN-NEXT: v_writelane_b32 v23, s14, 38			; GCN-NEXT: v_writelane_b32 v1, s14, 38
	; GCN-NEXT: v_writelane_b32 v23, s15, 39			; GCN-NEXT: v_writelane_b32 v1, s15, 39
	; GCN-NEXT: v_writelane_b32 v23, s16, 40			; GCN-NEXT: v_writelane_b32 v1, s16, 40
	; GCN-NEXT: v_writelane_b32 v23, s17, 41			; GCN-NEXT: v_writelane_b32 v1, s17, 41
	; GCN-NEXT: v_writelane_b32 v23, s18, 42			; GCN-NEXT: v_writelane_b32 v1, s18, 42
	; GCN-NEXT: v_writelane_b32 v23, s19, 43			; GCN-NEXT: v_writelane_b32 v1, s19, 43
	; GCN-NEXT: v_writelane_b32 v23, s20, 44			; GCN-NEXT: v_writelane_b32 v1, s20, 44
	; GCN-NEXT: v_writelane_b32 v23, s21, 45			; GCN-NEXT: v_writelane_b32 v1, s21, 45
	; GCN-NEXT: v_writelane_b32 v23, s22, 46			; GCN-NEXT: v_writelane_b32 v1, s22, 46
	; GCN-NEXT: v_writelane_b32 v23, s23, 47			; GCN-NEXT: v_writelane_b32 v1, s23, 47
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 48			; GCN-NEXT: v_writelane_b32 v1, s8, 48
	; GCN-NEXT: v_writelane_b32 v23, s9, 49			; GCN-NEXT: v_writelane_b32 v1, s9, 49
	; GCN-NEXT: v_writelane_b32 v23, s10, 50			; GCN-NEXT: v_writelane_b32 v1, s10, 50
	; GCN-NEXT: v_writelane_b32 v23, s11, 51			; GCN-NEXT: v_writelane_b32 v1, s11, 51
	; GCN-NEXT: v_writelane_b32 v23, s12, 52			; GCN-NEXT: v_writelane_b32 v1, s12, 52
	; GCN-NEXT: v_writelane_b32 v23, s13, 53			; GCN-NEXT: v_writelane_b32 v1, s13, 53
	; GCN-NEXT: v_writelane_b32 v23, s14, 54			; GCN-NEXT: v_writelane_b32 v1, s14, 54
	; GCN-NEXT: v_writelane_b32 v23, s15, 55			; GCN-NEXT: v_writelane_b32 v1, s15, 55
	; GCN-NEXT: v_writelane_b32 v23, s16, 56			; GCN-NEXT: v_writelane_b32 v1, s16, 56
	; GCN-NEXT: v_writelane_b32 v23, s17, 57			; GCN-NEXT: v_writelane_b32 v1, s17, 57
	; GCN-NEXT: v_writelane_b32 v23, s18, 58			; GCN-NEXT: v_writelane_b32 v1, s18, 58
	; GCN-NEXT: v_writelane_b32 v23, s19, 59			; GCN-NEXT: v_writelane_b32 v1, s19, 59
	; GCN-NEXT: v_writelane_b32 v23, s20, 60			; GCN-NEXT: v_writelane_b32 v1, s20, 60
	; GCN-NEXT: v_writelane_b32 v23, s21, 61			; GCN-NEXT: v_writelane_b32 v1, s21, 61
	; GCN-NEXT: v_writelane_b32 v23, s22, 62			; GCN-NEXT: v_writelane_b32 v1, s22, 62
	; GCN-NEXT: v_writelane_b32 v23, s23, 63			; GCN-NEXT: v_writelane_b32 v1, s23, 63
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
				; GCN-NEXT: buffer_store_dword v1, off, s[0:3], 0 offset:8 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[6:7]			; GCN-NEXT: ; def s[6:7]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_mov_b64 s[8:9], exec			; GCN-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: s_mov_b64 exec, 3
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: v_writelane_b32 v0, s6, 0			; GCN-NEXT: v_writelane_b32 v0, s6, 0
	; GCN-NEXT: v_writelane_b32 v0, s7, 1			; GCN-NEXT: v_writelane_b32 v0, s7, 1
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0			; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-NEXT: s_mov_b32 s5, 0			; GCN-NEXT: s_mov_b32 s5, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_cmp_lg_u32 s4, s5			; GCN-NEXT: s_cmp_lg_u32 s4, s5
	; GCN-NEXT: s_cbranch_scc1 .LBB0_2			; GCN-NEXT: s_cbranch_scc1 .LBB0_2
	; GCN-NEXT: ; %bb.1: ; %bb0			; GCN-NEXT: ; %bb.1: ; %bb0
	; GCN-NEXT: v_readlane_b32 s4, v23, 0			; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
	; GCN-NEXT: v_readlane_b32 s5, v23, 1			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s6, v23, 2			; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: v_readlane_b32 s7, v23, 3			; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
	; GCN-NEXT: v_readlane_b32 s8, v23, 4			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s9, v23, 5			; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: v_readlane_b32 s10, v23, 6			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s11, v23, 7			; GCN-NEXT: v_readlane_b32 s4, v1, 0
	; GCN-NEXT: v_readlane_b32 s12, v23, 8			; GCN-NEXT: v_readlane_b32 s5, v1, 1
	; GCN-NEXT: v_readlane_b32 s13, v23, 9			; GCN-NEXT: v_readlane_b32 s6, v1, 2
	; GCN-NEXT: v_readlane_b32 s14, v23, 10			; GCN-NEXT: v_readlane_b32 s7, v1, 3
	; GCN-NEXT: v_readlane_b32 s15, v23, 11			; GCN-NEXT: v_readlane_b32 s8, v1, 4
	; GCN-NEXT: v_readlane_b32 s16, v23, 12			; GCN-NEXT: v_readlane_b32 s9, v1, 5
	; GCN-NEXT: v_readlane_b32 s17, v23, 13			; GCN-NEXT: v_readlane_b32 s10, v1, 6
	; GCN-NEXT: v_readlane_b32 s18, v23, 14			; GCN-NEXT: v_readlane_b32 s11, v1, 7
	; GCN-NEXT: v_readlane_b32 s19, v23, 15			; GCN-NEXT: v_readlane_b32 s12, v1, 8
				; GCN-NEXT: v_readlane_b32 s13, v1, 9
				; GCN-NEXT: v_readlane_b32 s14, v1, 10
				; GCN-NEXT: v_readlane_b32 s15, v1, 11
				; GCN-NEXT: v_readlane_b32 s16, v1, 12
				; GCN-NEXT: v_readlane_b32 s17, v1, 13
				; GCN-NEXT: v_readlane_b32 s18, v1, 14
				; GCN-NEXT: v_readlane_b32 s19, v1, 15
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s4, v23, 16			; GCN-NEXT: v_readlane_b32 s4, v1, 16
	; GCN-NEXT: v_readlane_b32 s5, v23, 17			; GCN-NEXT: v_readlane_b32 s5, v1, 17
	; GCN-NEXT: v_readlane_b32 s6, v23, 18			; GCN-NEXT: v_readlane_b32 s6, v1, 18
	; GCN-NEXT: v_readlane_b32 s7, v23, 19			; GCN-NEXT: v_readlane_b32 s7, v1, 19
	; GCN-NEXT: v_readlane_b32 s8, v23, 20			; GCN-NEXT: v_readlane_b32 s8, v1, 20
	; GCN-NEXT: v_readlane_b32 s9, v23, 21			; GCN-NEXT: v_readlane_b32 s9, v1, 21
	; GCN-NEXT: v_readlane_b32 s10, v23, 22			; GCN-NEXT: v_readlane_b32 s10, v1, 22
	; GCN-NEXT: v_readlane_b32 s11, v23, 23			; GCN-NEXT: v_readlane_b32 s11, v1, 23
	; GCN-NEXT: v_readlane_b32 s12, v23, 24			; GCN-NEXT: v_readlane_b32 s12, v1, 24
	; GCN-NEXT: v_readlane_b32 s13, v23, 25			; GCN-NEXT: v_readlane_b32 s13, v1, 25
	; GCN-NEXT: v_readlane_b32 s14, v23, 26			; GCN-NEXT: v_readlane_b32 s14, v1, 26
	; GCN-NEXT: v_readlane_b32 s15, v23, 27			; GCN-NEXT: v_readlane_b32 s15, v1, 27
	; GCN-NEXT: v_readlane_b32 s16, v23, 28			; GCN-NEXT: v_readlane_b32 s16, v1, 28
	; GCN-NEXT: v_readlane_b32 s17, v23, 29			; GCN-NEXT: v_readlane_b32 s17, v1, 29
	; GCN-NEXT: v_readlane_b32 s18, v23, 30			; GCN-NEXT: v_readlane_b32 s18, v1, 30
	; GCN-NEXT: v_readlane_b32 s19, v23, 31			; GCN-NEXT: v_readlane_b32 s19, v1, 31
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s4, v23, 32			; GCN-NEXT: v_readlane_b32 s4, v1, 32
	; GCN-NEXT: v_readlane_b32 s5, v23, 33			; GCN-NEXT: v_readlane_b32 s5, v1, 33
	; GCN-NEXT: v_readlane_b32 s6, v23, 34			; GCN-NEXT: v_readlane_b32 s6, v1, 34
	; GCN-NEXT: v_readlane_b32 s7, v23, 35			; GCN-NEXT: v_readlane_b32 s7, v1, 35
	; GCN-NEXT: v_readlane_b32 s8, v23, 36			; GCN-NEXT: v_readlane_b32 s8, v1, 36
	; GCN-NEXT: v_readlane_b32 s9, v23, 37			; GCN-NEXT: v_readlane_b32 s9, v1, 37
	; GCN-NEXT: v_readlane_b32 s10, v23, 38			; GCN-NEXT: v_readlane_b32 s10, v1, 38
	; GCN-NEXT: v_readlane_b32 s11, v23, 39			; GCN-NEXT: v_readlane_b32 s11, v1, 39
	; GCN-NEXT: v_readlane_b32 s12, v23, 40			; GCN-NEXT: v_readlane_b32 s12, v1, 40
	; GCN-NEXT: v_readlane_b32 s13, v23, 41			; GCN-NEXT: v_readlane_b32 s13, v1, 41
	; GCN-NEXT: v_readlane_b32 s14, v23, 42			; GCN-NEXT: v_readlane_b32 s14, v1, 42
	; GCN-NEXT: v_readlane_b32 s15, v23, 43			; GCN-NEXT: v_readlane_b32 s15, v1, 43
	; GCN-NEXT: v_readlane_b32 s16, v23, 44			; GCN-NEXT: v_readlane_b32 s16, v1, 44
	; GCN-NEXT: v_readlane_b32 s17, v23, 45			; GCN-NEXT: v_readlane_b32 s17, v1, 45
	; GCN-NEXT: v_readlane_b32 s18, v23, 46			; GCN-NEXT: v_readlane_b32 s18, v1, 46
	; GCN-NEXT: v_readlane_b32 s19, v23, 47			; GCN-NEXT: v_readlane_b32 s19, v1, 47
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s8, v23, 48			; GCN-NEXT: v_readlane_b32 s8, v1, 48
	; GCN-NEXT: v_readlane_b32 s9, v23, 49			; GCN-NEXT: v_readlane_b32 s9, v1, 49
	; GCN-NEXT: v_readlane_b32 s10, v23, 50			; GCN-NEXT: v_readlane_b32 s10, v1, 50
	; GCN-NEXT: v_readlane_b32 s11, v23, 51			; GCN-NEXT: v_readlane_b32 s11, v1, 51
	; GCN-NEXT: v_readlane_b32 s12, v23, 52			; GCN-NEXT: v_readlane_b32 s12, v1, 52
	; GCN-NEXT: v_readlane_b32 s13, v23, 53			; GCN-NEXT: v_readlane_b32 s13, v1, 53
	; GCN-NEXT: v_readlane_b32 s14, v23, 54			; GCN-NEXT: v_readlane_b32 s14, v1, 54
	; GCN-NEXT: v_readlane_b32 s15, v23, 55			; GCN-NEXT: v_readlane_b32 s15, v1, 55
	; GCN-NEXT: v_readlane_b32 s16, v23, 56			; GCN-NEXT: v_readlane_b32 s16, v1, 56
	; GCN-NEXT: v_readlane_b32 s17, v23, 57			; GCN-NEXT: v_readlane_b32 s17, v1, 57
	; GCN-NEXT: v_readlane_b32 s18, v23, 58			; GCN-NEXT: v_readlane_b32 s18, v1, 58
	; GCN-NEXT: v_readlane_b32 s19, v23, 59			; GCN-NEXT: v_readlane_b32 s19, v1, 59
	; GCN-NEXT: v_readlane_b32 s20, v23, 60			; GCN-NEXT: v_readlane_b32 s20, v1, 60
	; GCN-NEXT: v_readlane_b32 s21, v23, 61			; GCN-NEXT: v_readlane_b32 s21, v1, 61
	; GCN-NEXT: v_readlane_b32 s22, v23, 62			; GCN-NEXT: v_readlane_b32 s22, v1, 62
	; GCN-NEXT: v_readlane_b32 s23, v23, 63			; GCN-NEXT: v_readlane_b32 s23, v1, 63
	; GCN-NEXT: s_mov_b64 s[6:7], exec
	; GCN-NEXT: s_mov_b64 exec, 3
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s4, v0, 0			; GCN-NEXT: v_readlane_b32 s4, v0, 0
	; GCN-NEXT: v_readlane_b32 s5, v0, 1			; GCN-NEXT: v_readlane_b32 s5, v0, 1
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[8:23]			; GCN-NEXT: ; use s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:5]			; GCN-NEXT: ; use s[4:5]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: .LBB0_2: ; %ret			; GCN-NEXT: .LBB0_2: ; %ret
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[24:25]
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
				; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:8 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[24:25]
				; GCN-NEXT: ; kill: killed $vgpr1
				; GCN-NEXT: ; kill: killed $vgpr0
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	call void asm sideeffect "", "~{v[0:7]}" () #0			call void asm sideeffect "", "~{v[0:7]}" () #0
	call void asm sideeffect "", "~{v[8:15]}" () #0			call void asm sideeffect "", "~{v[8:15]}" () #0
	call void asm sideeffect "", "~{v[16:19]}"() #0			call void asm sideeffect "", "~{v[16:19]}"() #0
	call void asm sideeffect "", "~{v[20:21]}"() #0			call void asm sideeffect "", "~{v[20:21]}"() #0
	call void asm sideeffect "", "~{v22}"() #0			call void asm sideeffect "", "~{v22}"() #0

	%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-partially-undef.mir

	Show All 14 Lines
	stack:			stack:
	- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }			- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr4			liveins: $sgpr4

	; CHECK-LABEL: name: sgpr_spill_s64_undef_high32			; CHECK-LABEL: name: sgpr_spill_s64_undef_high32
	; CHECK: liveins: $sgpr4, $vgpr0			; CHECK: liveins: $sgpr4
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
				; CHECK-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5
	SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)			SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)

	...			...

	---			---
	name: sgpr_spill_s64_undef_low32			name: sgpr_spill_s64_undef_low32
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	hasSpilledSGPRs: true			hasSpilledSGPRs: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'

	stack:			stack:
	- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }			- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr5			liveins: $sgpr5

	; CHECK-LABEL: name: sgpr_spill_s64_undef_low32			; CHECK-LABEL: name: sgpr_spill_s64_undef_low32
	; CHECK: liveins: $sgpr5, $vgpr0			; CHECK: liveins: $sgpr5
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
				; CHECK-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5
	SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)			SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)

	...			...

llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -sgpr-regalloc=fast -vgpr-regalloc=fast -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -sgpr-regalloc=fast -vgpr-regalloc=fast -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; Make sure there's no verifier error from improperly updated			; Make sure there's no verifier error from improperly updated
	; SlotIndexes if regalloc fast is manually used.			; SlotIndexes if regalloc fast is manually used.

	declare void @foo()			declare void @foo()

	define amdgpu_kernel void @kernel() {			define amdgpu_kernel void @kernel() {
	; GCN-LABEL: kernel:			; GCN-LABEL: kernel:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0			; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0
	; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1			; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1
	; GCN-NEXT: s_mov_b32 s38, -1			; GCN-NEXT: s_mov_b32 s38, -1
	; GCN-NEXT: s_mov_b32 s39, 0xe00000			; GCN-NEXT: s_mov_b32 s39, 0xe00000
	; GCN-NEXT: v_writelane_b32 v40, s4, 0			; GCN-NEXT: ; implicit-def: $vgpr3
	; GCN-NEXT: s_add_u32 s36, s36, s11			; GCN-NEXT: s_add_u32 s36, s36, s11
	; GCN-NEXT: v_writelane_b32 v40, s5, 1			; GCN-NEXT: v_writelane_b32 v3, s4, 0
				; GCN-NEXT: s_movk_i32 s32, 0x400
	; GCN-NEXT: s_addc_u32 s37, s37, 0			; GCN-NEXT: s_addc_u32 s37, s37, 0
	; GCN-NEXT: s_mov_b64 s[4:5], s[0:1]			; GCN-NEXT: s_mov_b32 s14, s10
	; GCN-NEXT: v_readlane_b32 s0, v40, 0
	; GCN-NEXT: s_mov_b32 s13, s9			; GCN-NEXT: s_mov_b32 s13, s9
	; GCN-NEXT: s_mov_b32 s12, s8			; GCN-NEXT: s_mov_b32 s12, s8
	; GCN-NEXT: v_readlane_b32 s1, v40, 1			; GCN-NEXT: s_mov_b64 s[10:11], s[6:7]
				; GCN-NEXT: v_writelane_b32 v3, s5, 1
				; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GCN-NEXT: buffer_store_dword v3, off, s[36:39], 0 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, s[34:35]
				; GCN-NEXT: s_mov_b64 s[4:5], s[0:1]
				; GCN-NEXT: v_readlane_b32 s0, v3, 0
				; GCN-NEXT: v_readlane_b32 s1, v3, 1
	; GCN-NEXT: s_add_u32 s8, s0, 36			; GCN-NEXT: s_add_u32 s8, s0, 36
	; GCN-NEXT: s_addc_u32 s9, s1, 0			; GCN-NEXT: s_addc_u32 s9, s1, 0
	; GCN-NEXT: s_getpc_b64 s[0:1]			; GCN-NEXT: s_getpc_b64 s[0:1]
	; GCN-NEXT: s_add_u32 s0, s0, foo@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s0, s0, foo@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s1, s1, foo@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s1, s1, foo@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x0			; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x0
	; GCN-NEXT: s_mov_b32 s14, s10
	; GCN-NEXT: s_mov_b64 s[10:11], s[6:7]
	; GCN-NEXT: s_mov_b64 s[6:7], s[2:3]			; GCN-NEXT: s_mov_b64 s[6:7], s[2:3]
	; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2			; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2
	; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1			; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1
	; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]			; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]
	; GCN-NEXT: v_or3_b32 v31, v0, v1, v2			; GCN-NEXT: v_or3_b32 v31, v0, v1, v2
	; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]			; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]
	; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GCN-NEXT: buffer_load_dword v0, off, s[36:39], 0 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[34:35]
				; GCN-NEXT: ; kill: killed $vgpr0
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	call void @foo()			call void @foo()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/sgpr-spill-vmem-large-frame.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=false -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=false -verify-machineinstrs -start-before=si-lower-sgpr-spills -stop-after=prologepilog -o - %s \| FileCheck %s

# Check that we allocate 2 emergency stack slots if we're spilling		# Check that we allocate 2 emergency stack slots if we're spilling
# SGPRs to memory and potentially have an offset larger than fits in		# SGPRs to memory and potentially have an offset larger than fits in
# the addressing mode of the memory instructions.		# the addressing mode of the memory instructions.

---		---
name: test		name: test
tracksRegLiveness: true		tracksRegLiveness: true
Show All 13 Lines	bb.0:
liveins: $sgpr30_sgpr31, $sgpr10, $sgpr11		liveins: $sgpr30_sgpr31, $sgpr10, $sgpr11
; CHECK-LABEL: name: test		; CHECK-LABEL: name: test
; CHECK: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31		; CHECK: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; CHECK-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; CHECK-NEXT: $sgpr4_sgpr5 = S_MOV_B64 $exec		; CHECK-NEXT: $sgpr4_sgpr5 = S_MOV_B64 $exec
; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr1		; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr1
; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)		; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; CHECK-NEXT: $vgpr1 = V_WRITELANE_B32 killed $sgpr10, 0, undef $vgpr1		; CHECK-NEXT: $vgpr1 = V_WRITELANE_B32 $sgpr10, 0, undef $vgpr1
; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0, addrspace 5)		; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0, addrspace 5)
; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)		; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5, implicit killed $vgpr1		; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5, implicit killed $vgpr1
; CHECK-NEXT: $sgpr4_sgpr5 = S_MOV_B64 $exec		; CHECK-NEXT: $sgpr4_sgpr5 = S_MOV_B64 $exec
; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr1		; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr1
; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)		; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)		; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
; CHECK-NEXT: $sgpr10 = V_READLANE_B32 killed $vgpr1, 0		; CHECK-NEXT: $sgpr10 = V_READLANE_B32 killed $vgpr1, 0
; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)		; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5, implicit killed $vgpr1		; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5, implicit killed $vgpr1
; CHECK-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $scc		; CHECK-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $scc
S_CMP_EQ_U32 0, 0, implicit-def $scc		S_CMP_EQ_U32 0, 0, implicit-def $scc
SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32		renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
S_SETPC_B64 $sgpr30_sgpr31, implicit $scc		S_SETPC_B64 $sgpr30_sgpr31, implicit $scc
...		...

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

Show All 10 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "", "~{vcc}" () #0		call void asm sideeffect "", "~{vcc}" () #0
ret void		ret void
}		}

define void @spill_sgpr_with_no_lower_vgpr_available() #0 {		define void @spill_sgpr_with_no_lower_vgpr_available() #0 {
; GCN-LABEL: spill_sgpr_with_no_lower_vgpr_available:		; GCN-LABEL: spill_sgpr_with_no_lower_vgpr_available:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s24, s33		; GCN-NEXT: s_mov_b32 s18, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s33 offset:452 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: s_add_i32 s32, s32, 0x7400		; GCN-NEXT: s_add_i32 s32, s32, 0x7400
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v255, s30, 0		; GCN-NEXT: v_writelane_b32 v255, s30, 0
; GCN-NEXT: v_writelane_b32 v255, s31, 1		; GCN-NEXT: v_writelane_b32 v255, s31, 1
; GCN-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:452 ; 4-byte Folded Spill
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:444		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:444
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:452 ; 4-byte Folded Reload
; GCN-NEXT: s_getpc_b64 s[16:17]		; GCN-NEXT: s_getpc_b64 s[16:17]
; GCN-NEXT: s_add_u32 s16, s16, child_function@gotpcrel32@lo+4		; GCN-NEXT: s_add_u32 s16, s16, child_function@gotpcrel32@lo+4
; GCN-NEXT: s_addc_u32 s17, s17, child_function@gotpcrel32@hi+12		; GCN-NEXT: s_addc_u32 s17, s17, child_function@gotpcrel32@hi+12
; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0		; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
; GCN-NEXT: s_mov_b64 s[22:23], s[2:3]		; GCN-NEXT: s_mov_b64 s[22:23], s[2:3]
; GCN-NEXT: s_mov_b64 s[20:21], s[0:1]		; GCN-NEXT: s_mov_b64 s[20:21], s[0:1]
; GCN-NEXT: s_mov_b64 s[0:1], s[20:21]		; GCN-NEXT: s_mov_b64 s[0:1], s[20:21]
; GCN-NEXT: s_mov_b64 s[2:3], s[22:23]		; GCN-NEXT: s_mov_b64 s[2:3], s[22:23]
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s33 offset:452 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00		; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
; GCN-NEXT: s_mov_b32 s33, s24		; GCN-NEXT: s_mov_b32 s33, s18
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, ptr addrspace(5) %alloca		store volatile i32 0, ptr addrspace(5) %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
Show All 24 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
call void @child_function()		call void @child_function()
ret void		ret void
}		}

define void @spill_to_lowest_available_vgpr() #0 {		define void @spill_to_lowest_available_vgpr() #0 {
; GCN-LABEL: spill_to_lowest_available_vgpr:		; GCN-LABEL: spill_to_lowest_available_vgpr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s24, s33		; GCN-NEXT: s_mov_b32 s18, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: s_add_i32 s32, s32, 0x7400		; GCN-NEXT: s_add_i32 s32, s32, 0x7400
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v254, s30, 0		; GCN-NEXT: v_writelane_b32 v254, s30, 0
; GCN-NEXT: v_writelane_b32 v254, s31, 1		; GCN-NEXT: v_writelane_b32 v254, s31, 1
; GCN-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:440		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:440
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload
; GCN-NEXT: s_getpc_b64 s[16:17]		; GCN-NEXT: s_getpc_b64 s[16:17]
; GCN-NEXT: s_add_u32 s16, s16, child_function@gotpcrel32@lo+4		; GCN-NEXT: s_add_u32 s16, s16, child_function@gotpcrel32@lo+4
; GCN-NEXT: s_addc_u32 s17, s17, child_function@gotpcrel32@hi+12		; GCN-NEXT: s_addc_u32 s17, s17, child_function@gotpcrel32@hi+12
; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0		; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
; GCN-NEXT: s_mov_b64 s[22:23], s[2:3]		; GCN-NEXT: s_mov_b64 s[22:23], s[2:3]
; GCN-NEXT: s_mov_b64 s[20:21], s[0:1]		; GCN-NEXT: s_mov_b64 s[20:21], s[0:1]
; GCN-NEXT: s_mov_b64 s[0:1], s[20:21]		; GCN-NEXT: s_mov_b64 s[0:1], s[20:21]
; GCN-NEXT: s_mov_b64 s[2:3], s[22:23]		; GCN-NEXT: s_mov_b64 s[2:3], s[22:23]
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00		; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
; GCN-NEXT: s_mov_b32 s33, s24		; GCN-NEXT: s_mov_b32 s33, s18
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, ptr addrspace(5) %alloca		store volatile i32 0, ptr addrspace(5) %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
Show All 24 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
call void @child_function()		call void @child_function()
ret void		ret void
}		}

define void @spill_sgpr_with_sgpr_uses() #0 {		define void @spill_sgpr_with_sgpr_uses() #0 {
; GCN-LABEL: spill_sgpr_with_sgpr_uses:		; GCN-LABEL: spill_sgpr_with_sgpr_uses:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_store_dword v238, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v238, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v239, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v239, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s32 ; 4-byte Folded Spill
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:440		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:440
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s4		; GCN-NEXT: ; def s4
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v254, s4, 0		; GCN-NEXT: s_waitcnt vmcnt(0)
		; GCN-NEXT: v_writelane_b32 v0, s4, 0
		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
; GCN-NEXT: s_cbranch_scc1 .LBB3_2		; GCN-NEXT: s_cbranch_scc1 .LBB3_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s4, v254, 0		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
		; GCN-NEXT: s_waitcnt vmcnt(0)
		; GCN-NEXT: v_readlane_b32 s4, v0, 0
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s4		; GCN-NEXT: ; use s4
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: .LBB3_2: ; %ret		; GCN-NEXT: .LBB3_2: ; %ret
		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
		; GCN-NEXT: ; kill: killed $vgpr0
; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s32 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s32 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v239, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v239, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v238, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v238, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:408 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:408 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, ptr addrspace(5) %alloca		store volatile i32 0, ptr addrspace(5) %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
▲ Show 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[16:17]
ret void		ret void
}		}

define void @spill_sgpr_no_free_vgpr(ptr addrspace(1) %out, ptr addrspace(1) %in) #0 {		define void @spill_sgpr_no_free_vgpr(ptr addrspace(1) %out, ptr addrspace(1) %in) #0 {
; GCN-LABEL: spill_sgpr_no_free_vgpr:		; GCN-LABEL: spill_sgpr_no_free_vgpr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:464 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
; GCN-NEXT: ; implicit-def: $sgpr4		; GCN-NEXT: ; implicit-def: $sgpr4
; GCN-NEXT: ; implicit-def: $sgpr4		; GCN-NEXT: ; implicit-def: $sgpr4
; GCN-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec		; GCN-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GCN-NEXT: v_mov_b32_e32 v3, v5		; GCN-NEXT: v_mov_b32_e32 v3, v5
; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5		; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5
; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5		; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5
; GCN-NEXT: flat_load_dwordx4 v[5:8], v[2:3]		; GCN-NEXT: flat_load_dwordx4 v[5:8], v[2:3]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:452 ; 4-byte Folded Spill
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:452 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:456 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:456 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:460 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:460 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:464 ; 4-byte Folded Spill
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:452 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:452 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:456 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:456 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:460 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:460 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:464 ; 4-byte Folded Reload
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: flat_store_dwordx4 v[0:1], v[5:8]		; GCN-NEXT: flat_store_dwordx4 v[0:1], v[5:8]
; GCN-NEXT: v_readlane_b32 s37, v4, 3		; GCN-NEXT: v_readlane_b32 s37, v4, 3
; GCN-NEXT: v_readlane_b32 s36, v4, 2		; GCN-NEXT: v_readlane_b32 s36, v4, 2
; GCN-NEXT: v_readlane_b32 s35, v4, 1		; GCN-NEXT: v_readlane_b32 s35, v4, 1
; GCN-NEXT: v_readlane_b32 s34, v4, 0		; GCN-NEXT: v_readlane_b32 s34, v4, 0
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload
; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:464 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%a = load <4 x i32>, ptr addrspace(1) %in		%a = load <4 x i32>, ptr addrspace(1) %in
call void asm sideeffect "",		call void asm sideeffect "",
"~{v6},~{v7},~{v8},~{v9}		"~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}		,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}" () #0		,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}" () #0
ret void		ret void
}		}

define void @spill_sgpr_no_free_vgpr_ipra() #0 {		define void @spill_sgpr_no_free_vgpr_ipra() #0 {
; GCN-LABEL: spill_sgpr_no_free_vgpr_ipra:		; GCN-LABEL: spill_sgpr_no_free_vgpr_ipra:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s24, s33		; GCN-NEXT: s_mov_b32 s18, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_i32 s32, s32, 0x7400		; GCN-NEXT: s_add_i32 s32, s32, 0x7400
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 253 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload
; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00		; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
; GCN-NEXT: s_mov_b32 s33, s24		; GCN-NEXT: s_mov_b32 s33, s18
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
call void @child_function_ipra()		call void @child_function_ipra()
ret void		ret void
}		}

define internal void @child_function_ipra_tail_call() #0 {		define internal void @child_function_ipra_tail_call() #0 {
; GCN-LABEL: child_function_ipra_tail_call:		; GCN-LABEL: child_function_ipra_tail_call:
▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; ALL-LABEL: {{^}}test:			; ALL-LABEL: {{^}}test:
	; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0			; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0
	; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000			; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000

	; Make sure we are handling hazards correctly.			; Make sure we are handling hazards correctly.
	; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:4			; SGPR: v_mov_b32_e32 v0, vcc_lo
				; SGPR-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; SGPR-NEXT: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ; 4-byte Folded Reload
				; SGPR-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; SGPR-NEXT: s_waitcnt vmcnt(0)			; SGPR-NEXT: s_waitcnt vmcnt(0)
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 0			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 0
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 1			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 1
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 2			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 2
	; SGPR-NEXT: v_readlane_b32 s[[HI:[0-9]+]], [[VHI]], 3			; SGPR-NEXT: v_readlane_b32 s[[HI:[0-9]+]], [[VHI]], 3
	; SGPR-NEXT: buffer_load_dword [[VHI]], off, s[96:99], 0			; SGPR-NEXT: ; kill: killed $vgpr1
	; SGPR-NEXT: s_waitcnt vmcnt(0)			; SGPR-NEXT: s_nop 4
	; SGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SGPR-NEXT: s_nop 1
	; SGPR-NEXT: buffer_store_dword v0, off, s[0:3], 0

	; ALL: s_endpgm			; ALL: s_endpgm
	define amdgpu_kernel void @test(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @test(ptr addrspace(1) %out, i32 %in) {
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	call void asm sideeffect "", "~{s[8:15]}" ()			call void asm sideeffect "", "~{s[8:15]}" ()
	call void asm sideeffect "", "~{s[16:23]}" ()			call void asm sideeffect "", "~{s[16:23]}" ()
	call void asm sideeffect "", "~{s[24:31]}" ()			call void asm sideeffect "", "~{s[24:31]}" ()
	call void asm sideeffect "", "~{s[32:39]}" ()			call void asm sideeffect "", "~{s[32:39]}" ()
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	}			}

	; Have another non-tail in the function			; Have another non-tail in the function
	; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:			; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:
	; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33			; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1			; GCN-NEXT: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1
	; GCN-NEXT: buffer_store_dword [[CSRV:v[0-9]+]], off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSRV:v[0-9]+]], off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword [[CSRV_1:v[0-9]+]], off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN-DAG: s_addk_i32 s32, 0x800			; GCN-DAG: s_addk_i32 s32, 0x400
	; GCN: v_writelane_b32 [[CSRV_1]], [[FP_SCRATCH_COPY]], 0			; GCN: v_writelane_b32 [[CSRV]], [[FP_SCRATCH_COPY]], 2

	; GCN-DAG: s_getpc_b64 s[4:5]			; GCN-DAG: s_getpc_b64 s[4:5]
	; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4			; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
	; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12			; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12

	; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0			; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0
	; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1			; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1


	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

	; GCN: s_getpc_b64 s[4:5]			; GCN: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12
	; GCN-NEXT: v_readlane_b32 s31, [[CSRV]], 1			; GCN-NEXT: v_readlane_b32 s31, [[CSRV]], 1
	; GCN-NEXT: v_readlane_b32 s30, [[CSRV]], 0			; GCN-NEXT: v_readlane_b32 s30, [[CSRV]], 0
	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSRV_1]], 0			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSRV]], 2
	; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1			; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword [[CSRV_1]], off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[8:9]			; GCN-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-NEXT: s_addk_i32 s32, 0xf800			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {			define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
	entry:			entry:
	%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)			%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
	%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)			%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)
	ret i32 %ret			ret i32 %ret
	}			}
	▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/snippet-copy-bundle-regression.mir

Show All 27 Lines	machineFunctionInfo:
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
occupancy: 8		occupancy: 8
body: \|		body: \|
; CHECK-LABEL: name: kernel		; CHECK-LABEL: name: kernel
; CHECK: bb.0:		; CHECK: bb.0:
; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; CHECK-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11		; CHECK-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: renamable $vgpr1 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr34_sgpr35 = IMPLICIT_DEF		; CHECK-NEXT: renamable $sgpr34_sgpr35 = IMPLICIT_DEF
; CHECK-NEXT: dead renamable $vgpr0 = IMPLICIT_DEF		; CHECK-NEXT: dead renamable $vgpr0 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr41 = IMPLICIT_DEF		; CHECK-NEXT: renamable $sgpr41 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr38_sgpr39 = COPY undef $sgpr8_sgpr9		; CHECK-NEXT: renamable $sgpr38_sgpr39 = COPY undef $sgpr8_sgpr9
; CHECK-NEXT: renamable $sgpr36_sgpr37 = IMPLICIT_DEF		; CHECK-NEXT: renamable $sgpr36_sgpr37 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 0, 0 :: (dereferenceable invariant load (s256), align 16, addrspace 4)		; CHECK-NEXT: renamable $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 0, 0 :: (dereferenceable invariant load (s256), align 16, addrspace 4)
; CHECK-NEXT: dead renamable $sgpr4 = S_LOAD_DWORD_IMM renamable $sgpr38_sgpr39, 48, 0 :: (dereferenceable invariant load (s32), align 16, addrspace 4)		; CHECK-NEXT: dead renamable $sgpr4 = S_LOAD_DWORD_IMM renamable $sgpr38_sgpr39, 48, 0 :: (dereferenceable invariant load (s32), align 16, addrspace 4)
; CHECK-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 56, 0 :: (dereferenceable invariant load (s256), align 8, addrspace 4)		; CHECK-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 56, 0 :: (dereferenceable invariant load (s256), align 8, addrspace 4)
; CHECK-NEXT: SI_SPILL_S256_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s256) into %stack.0, align 4, addrspace 5)		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr4, 0, killed $vgpr1, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr5, 1, killed $vgpr1
		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr6, 2, killed $vgpr1
		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr7, 3, killed $vgpr1
		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr8, 4, killed $vgpr1
		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr9, 5, killed $vgpr1
		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr10, 6, killed $vgpr1
		; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 killed $sgpr11, 7, killed $vgpr1, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
		; CHECK-NEXT: SI_SPILL_WWM_V32_SAVE killed $vgpr1, %stack.1, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM renamable $sgpr44_sgpr45, 0, 0 :: (invariant load (s64), align 16, addrspace 4)		; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM renamable $sgpr44_sgpr45, 0, 0 :: (invariant load (s64), align 16, addrspace 4)
; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
; CHECK-NEXT: $vgpr1 = COPY renamable $sgpr51		; CHECK-NEXT: $vgpr1 = COPY renamable $sgpr51
; CHECK-NEXT: dead $sgpr30_sgpr31 = SI_CALL undef renamable $sgpr4_sgpr5, 0, csr_amdgpu, implicit undef $sgpr15, implicit $vgpr31, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; CHECK-NEXT: dead $sgpr30_sgpr31 = SI_CALL undef renamable $sgpr4_sgpr5, 0, csr_amdgpu, implicit undef $sgpr15, implicit $vgpr31, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
; CHECK-NEXT: $vcc = COPY renamable $sgpr40_sgpr41		; CHECK-NEXT: $vcc = COPY renamable $sgpr40_sgpr41
; CHECK-NEXT: S_CBRANCH_VCCZ %bb.2, implicit undef $vcc		; CHECK-NEXT: S_CBRANCH_VCCZ %bb.2, implicit undef $vcc
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:		; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)		; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = SI_SPILL_S256_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s256) from %stack.0, align 4, addrspace 5)		; CHECK-NEXT: renamable $vgpr1 = SI_SPILL_WWM_V32_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
		; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr1, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
		; CHECK-NEXT: $sgpr5 = V_READLANE_B32 $vgpr1, 1
		; CHECK-NEXT: $sgpr6 = V_READLANE_B32 $vgpr1, 2
		; CHECK-NEXT: $sgpr7 = V_READLANE_B32 $vgpr1, 3
		; CHECK-NEXT: $sgpr8 = V_READLANE_B32 $vgpr1, 4
		; CHECK-NEXT: $sgpr9 = V_READLANE_B32 $vgpr1, 5
		; CHECK-NEXT: $sgpr10 = V_READLANE_B32 $vgpr1, 6
		; CHECK-NEXT: $sgpr11 = V_READLANE_B32 $vgpr1, 7
		; CHECK-NEXT: $noreg = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
		; CHECK-NEXT: $exec = S_MOV_B64 killed $noreg
; CHECK-NEXT: S_BRANCH %bb.3		; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:		; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)		; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = SI_SPILL_S256_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s256) from %stack.0, align 4, addrspace 5)		; CHECK-NEXT: renamable $vgpr1 = SI_SPILL_WWM_V32_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
		; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr1, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
		; CHECK-NEXT: $sgpr5 = V_READLANE_B32 $vgpr1, 1
		; CHECK-NEXT: $sgpr6 = V_READLANE_B32 $vgpr1, 2
		; CHECK-NEXT: $sgpr7 = V_READLANE_B32 $vgpr1, 3
		; CHECK-NEXT: $sgpr8 = V_READLANE_B32 $vgpr1, 4
		; CHECK-NEXT: $sgpr9 = V_READLANE_B32 $vgpr1, 5
		; CHECK-NEXT: $sgpr10 = V_READLANE_B32 $vgpr1, 6
		; CHECK-NEXT: $sgpr11 = V_READLANE_B32 $vgpr1, 7
; CHECK-NEXT: S_CMP_LG_U64 renamable $sgpr4_sgpr5, 0, implicit-def $scc		; CHECK-NEXT: S_CMP_LG_U64 renamable $sgpr4_sgpr5, 0, implicit-def $scc
		; CHECK-NEXT: $noreg = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
		; CHECK-NEXT: $exec = S_MOV_B64 killed $noreg
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:		; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.5(0x40000000), %bb.4(0x40000000)		; CHECK-NEXT: successors: %bb.5(0x40000000), %bb.4(0x40000000)
; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $vgpr1, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_CBRANCH_VCCZ %bb.5, implicit undef $vcc		; CHECK-NEXT: S_CBRANCH_VCCZ %bb.5, implicit undef $vcc
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:		; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)		; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $vgpr1, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_CMP_EQ_U32 renamable $sgpr8, 0, implicit-def $scc		; CHECK-NEXT: S_CMP_EQ_U32 renamable $sgpr8, 0, implicit-def $scc
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:		; CHECK-NEXT: bb.5:
; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000000F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $vgpr1, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000000F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM killed renamable $sgpr38_sgpr39, 40, 0 :: (dereferenceable invariant load (s64), addrspace 4)		; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM killed renamable $sgpr38_sgpr39, 40, 0 :: (dereferenceable invariant load (s64), addrspace 4)
; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, killed renamable $sgpr6_sgpr7, 0, 0, implicit $exec :: (store (s32), addrspace 1)		; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, killed renamable $sgpr6_sgpr7, 0, 0, implicit $exec :: (store (s32), addrspace 1)
; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, renamable $sgpr50_sgpr51, 0, 0, implicit $exec :: (store (s32), addrspace 1)		; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, renamable $sgpr50_sgpr51, 0, 0, implicit $exec :: (store (s32), addrspace 1)
; CHECK-NEXT: dead renamable $vgpr0 = COPY killed renamable $sgpr49		; CHECK-NEXT: dead renamable $vgpr0 = COPY killed renamable $sgpr49
; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
; CHECK-NEXT: $sgpr6_sgpr7 = COPY killed renamable $sgpr36_sgpr37		; CHECK-NEXT: $sgpr6_sgpr7 = COPY killed renamable $sgpr36_sgpr37
; CHECK-NEXT: $sgpr10_sgpr11 = COPY killed renamable $sgpr34_sgpr35		; CHECK-NEXT: $sgpr10_sgpr11 = COPY killed renamable $sgpr34_sgpr35
; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
		; CHECK-NEXT: KILL killed renamable $vgpr1
; CHECK-NEXT: S_ENDPGM 0		; CHECK-NEXT: S_ENDPGM 0
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr14, $sgpr15, $sgpr16		liveins: $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr14, $sgpr15, $sgpr16

%0:sgpr_64 = IMPLICIT_DEF		%0:sgpr_64 = IMPLICIT_DEF
%1:vgpr_32 = IMPLICIT_DEF		%1:vgpr_32 = IMPLICIT_DEF
undef %2.sub1:sreg_64 = IMPLICIT_DEF		undef %2.sub1:sreg_64 = IMPLICIT_DEF
%3:sgpr_64 = COPY undef $sgpr8_sgpr9		%3:sgpr_64 = COPY undef $sgpr8_sgpr9
Show All 36 Lines

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}spill_csr_s5_copy:			; GCN-LABEL: {{^}}spill_csr_s5_copy:
	; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33			; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
	; GCN: s_or_saveexec_b64			; GCN: s_xor_saveexec_b64
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: v_writelane_b32 v41, [[FP_SCRATCH_COPY]], 0			; GCN: v_writelane_b32 v40, [[FP_SCRATCH_COPY]], 4
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9
	; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}			; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}

	; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v41, 0			; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v40, 4
	; GCN: s_or_saveexec_b64			; GCN: s_xor_saveexec_b64
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec
	; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @spill_csr_s5_copy() #0 {			define void @spill_csr_s5_copy() #0 {
	bb:			bb:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%tmp = tail call i64 @func() #1			%tmp = tail call i64 @func() #1
	%tmp1 = getelementptr inbounds i32, ptr addrspace(1) null, i64 %tmp			%tmp1 = getelementptr inbounds i32, ptr addrspace(1) null, i64 %tmp
	Show All 10 Lines

llvm/test/CodeGen/AMDGPU/spill-reg-tuple-super-reg-use.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=si-lower-sgpr-spills,prologepilog,machine-cp -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s		# RUN: llc -march=amdgcn -mcpu=gfx900 -start-before=si-lower-sgpr-spills -stop-after=prologepilog -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

# Make sure the initial first $sgpr1 = COPY $sgpr2 copy is not deleted		# Make sure the initial first $sgpr1 = COPY $sgpr2 copy is not deleted
# by the copy propagation after lowering the spill.		# by the copy propagation after lowering the spill.

---		---
name: spill_sgpr128_use_subreg		name: spill_sgpr128_use_subreg
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
Show All 9 Lines	bb.0:
liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3

; GCN-LABEL: name: spill_sgpr128_use_subreg		; GCN-LABEL: name: spill_sgpr128_use_subreg
; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9
		; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2		; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, $vgpr0		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, killed $vgpr0
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, $vgpr0		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, killed $vgpr0
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: renamable $sgpr8 = COPY killed renamable $sgpr1		; GCN-NEXT: renamable $sgpr8 = COPY renamable $sgpr1
		; GCN-NEXT: KILL killed renamable $vgpr0
; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
; GCN-NEXT: S_ENDPGM 0, implicit $sgpr8		; GCN-NEXT: S_ENDPGM 0, implicit $sgpr8
renamable $sgpr1 = COPY $sgpr2		renamable $sgpr1 = COPY $sgpr2
SI_SPILL_S128_SAVE renamable $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_S128_SAVE renamable $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)
renamable $sgpr8 = COPY killed renamable $sgpr1		renamable $sgpr8 = COPY killed renamable $sgpr1
S_ENDPGM 0, implicit $sgpr8		S_ENDPGM 0, implicit $sgpr8
Show All 15 Lines	bb.0:
liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3

; GCN-LABEL: name: spill_sgpr128_use_kill		; GCN-LABEL: name: spill_sgpr128_use_kill
; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9
		; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2		; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, $vgpr0		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, killed $vgpr0
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, $vgpr0		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, killed $vgpr0
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr3, 3, $vgpr0, implicit killed $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
		; GCN-NEXT: KILL killed renamable $vgpr0
; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
; GCN-NEXT: S_ENDPGM 0		; GCN-NEXT: S_ENDPGM 0
renamable $sgpr1 = COPY $sgpr2		renamable $sgpr1 = COPY $sgpr2
SI_SPILL_S128_SAVE renamable killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_S128_SAVE renamable killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0		S_ENDPGM 0
...		...
Show All 10 Lines

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7

; GCN-LABEL: name: spill_vgpr128_use_subreg		; GCN-LABEL: name: spill_vgpr128_use_subreg
; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2		; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 8, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 8, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)
; GCN-NEXT: renamable $vgpr8 = COPY killed renamable $vgpr1		; GCN-NEXT: renamable $vgpr8 = COPY $vgpr2, implicit $exec
; GCN-NEXT: S_ENDPGM 0, implicit $vgpr8		; GCN-NEXT: S_ENDPGM 0, implicit $vgpr8
renamable $vgpr1 = COPY $vgpr2		renamable $vgpr1 = COPY $vgpr2
SI_SPILL_V128_SAVE renamable $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_V128_SAVE renamable $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)
renamable $vgpr8 = COPY killed renamable $vgpr1		renamable $vgpr8 = COPY killed renamable $vgpr1
S_ENDPGM 0, implicit $vgpr8		S_ENDPGM 0, implicit $vgpr8
...		...

---		---
name: spill_vgpr128_use_kill		name: spill_vgpr128_use_kill
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
scratchRSrcReg: $sgpr100_sgpr101_sgpr102_sgpr103		scratchRSrcReg: $sgpr100_sgpr101_sgpr102_sgpr103
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32

stack:		stack:
- { id: 0, type: spill-slot, size: 16, alignment: 4 }		- { id: 0, type: spill-slot, size: 16, alignment: 4 }

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7

; GCN-LABEL: name: spill_vgpr128_use_kill		; GCN-LABEL: name: spill_vgpr128_use_kill
; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2		; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 8, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 8, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)
; GCN-NEXT: S_ENDPGM 0		; GCN-NEXT: S_ENDPGM 0
renamable $vgpr1 = COPY $vgpr2		renamable $vgpr1 = COPY $vgpr2
SI_SPILL_V128_SAVE renamable killed $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_V128_SAVE renamable killed $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0		S_ENDPGM 0
...		...

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -run-pass=si-lower-sgpr-spills -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				# A simple SGPR spill. Implicit def for lane VGPR should be inserted just before the spill instruction.
				---
				name: sgpr32_spill
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr30_sgpr31, $sgpr10
				; GCN-LABEL: name: sgpr32_spill
				; GCN: liveins: $sgpr30_sgpr31, $sgpr10
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_]], 0
				arsenmUnsubmitted Not Done Reply Inline Actions The test checks seem to not capture that these operands are tied arsenm: The test checks seem to not capture that these operands are tied
				cdevadasAuthorUnsubmitted Done Reply Inline Actions The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to show the tied operand. It's the simplest case. cdevadas: The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to…
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31
				S_NOP 0
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31
				...

				# Needed an additional virtual lane register as the lanes of current register are fully occupied while spilling a wide SGPR tuple.
				# There must be two implicit def for the two lane VGPRs.

				---
				name: sgpr_spill_lane_crossover
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				- { id: 1, type: spill-slot, size: 128, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr30_sgpr31, $sgpr10, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-LABEL: name: sgpr_spill_lane_crossover
				; GCN: liveins: $sgpr10, $sgpr64, $sgpr65, $sgpr66, $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, $sgpr72, $sgpr73, $sgpr74, $sgpr75, $sgpr76, $sgpr77, $sgpr78, $sgpr79, $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, $sgpr85, $sgpr86, $sgpr87, $sgpr88, $sgpr89, $sgpr90, $sgpr91, $sgpr92, $sgpr93, $sgpr94, $sgpr95, $vgpr0, $sgpr30_sgpr31, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr64, 0, $vgpr0
				cdevadasAuthorUnsubmitted Done Reply Inline Actions This test is already hand-modified to check the tied operands. cdevadas: This test is already hand-modified to check the tied operands.
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr65, 1, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr66, 2, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr67, 3, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr68, 4, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr69, 5, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr70, 6, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr71, 7, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr72, 8, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr73, 9, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr74, 10, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr75, 11, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr76, 12, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr77, 13, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr78, 14, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr79, 15, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr80, 16, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr81, 17, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr82, 18, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr83, 19, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr84, 20, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr85, 21, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr86, 22, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr87, 23, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr88, 24, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr89, 25, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr90, 26, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr91, 27, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr92, 28, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr93, 29, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr94, 30, $vgpr0
				; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr95, 31, $vgpr0
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr64, 1, [[V_WRITELANE_B32_1]], implicit-def $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr65, 2, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr66, 3, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr67, 4, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr68, 5, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr69, 6, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr70, 7, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr71, 8, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr72, 9, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr73, 10, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr74, 11, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr75, 12, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr76, 13, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr77, 14, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr78, 15, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr79, 16, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr80, 17, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr81, 18, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr82, 19, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr83, 20, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr84, 21, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr85, 22, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr86, 23, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr87, 24, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr88, 25, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr89, 26, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr90, 27, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr91, 28, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr92, 29, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr93, 30, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr94, 31, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr95, 32, [[V_WRITELANE_B32_1]], implicit killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: $sgpr64 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1, implicit-def $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: $sgpr65 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
				; GCN-NEXT: $sgpr66 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
				; GCN-NEXT: $sgpr67 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
				; GCN-NEXT: $sgpr68 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
				; GCN-NEXT: $sgpr69 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
				; GCN-NEXT: $sgpr70 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 7
				; GCN-NEXT: $sgpr71 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 8
				; GCN-NEXT: $sgpr72 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 9
				; GCN-NEXT: $sgpr73 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 10
				; GCN-NEXT: $sgpr74 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 11
				; GCN-NEXT: $sgpr75 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 12
				; GCN-NEXT: $sgpr76 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 13
				; GCN-NEXT: $sgpr77 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 14
				; GCN-NEXT: $sgpr78 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 15
				; GCN-NEXT: $sgpr79 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 16
				; GCN-NEXT: $sgpr80 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 17
				; GCN-NEXT: $sgpr81 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 18
				; GCN-NEXT: $sgpr82 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 19
				; GCN-NEXT: $sgpr83 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 20
				; GCN-NEXT: $sgpr84 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 21
				; GCN-NEXT: $sgpr85 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 22
				; GCN-NEXT: $sgpr86 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 23
				; GCN-NEXT: $sgpr87 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 24
				; GCN-NEXT: $sgpr88 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 25
				; GCN-NEXT: $sgpr89 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 26
				; GCN-NEXT: $sgpr90 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 27
				; GCN-NEXT: $sgpr91 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 28
				; GCN-NEXT: $sgpr92 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 29
				; GCN-NEXT: $sgpr93 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 30
				; GCN-NEXT: $sgpr94 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 31
				; GCN-NEXT: $sgpr95 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 32
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31
				S_NOP 0
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				SI_SPILL_S1024_SAVE killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_NOP 0
				renamable $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95 = SI_SPILL_S1024_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31
				...

				# The implicit def for the lane VGPR should be inserted at the common dominator block (the entry block here).

				---
				name: lane_vgpr_implicit_def_at_common_dominator_block
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: lane_vgpr_implicit_def_at_common_dominator_block
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				arsenmUnsubmitted Not Done Reply Inline Actions Needs a case where the insert block has no terminators arsenm: Needs a case where the insert block has no terminators
				cdevadasAuthorUnsubmitted Done Reply Inline Actions I couldn't write one successfully. Will try some unstructured flow to force one. cdevadas: I couldn't write one successfully. Will try some unstructured flow to force one.
				cdevadasAuthorUnsubmitted Done Reply Inline Actions I don't think such a case exists. A fall-through block will have only one successor and that becomes the nearest dominator for its children. It would be true even for any unstructured flow. cdevadas: I don't think such a case exists. A fall-through block will have only one successor and that…
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 10
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 20
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				bb.0:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				S_NOP 0
				S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				bb.1:
				liveins: $sgpr10, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 10
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_BRANCH %bb.3
				bb.2:
				liveins: $sgpr10, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 20
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_BRANCH %bb.3
				bb.3:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				...

				# The common dominator block is visited only at the end. The insertion point was initially identified to the
				# terminator instruction in the dominator block which later becomes the point where a spill get inserted in the same block.

				---
				name: dominator_block_follows_the_successors_bbs
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: dominator_block_follows_the_successors_bbs
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: successors: %bb.2(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[DEF]], 0
				; GCN-NEXT: $sgpr10 = S_ADD_I32 $sgpr10, 15, implicit-def dead $scc
				; GCN-NEXT: S_BRANCH %bb.2
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[DEF]], 0
				; GCN-NEXT: $sgpr10 = S_ADD_I32 $sgpr10, 20, implicit-def dead $scc
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 10
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				; GCN-NEXT: S_BRANCH %bb.1
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.4:
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				bb.0:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				S_NOP 0
				S_BRANCH %bb.3
				bb.1:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				$sgpr10 = S_ADD_I32 $sgpr10, 15, implicit-def dead $scc
				S_BRANCH %bb.2
				bb.2:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				$sgpr10 = S_ADD_I32 $sgpr10, 20, implicit-def dead $scc
				S_BRANCH %bb.3
				bb.3:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 10
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				S_BRANCH %bb.1
				bb.4:
				liveins: $sgpr10, $sgpr30_sgpr31
				S_NOP 0
				S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				...

llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -O0 -verify-machineinstrs -o - %s \| FileCheck %s

				; Regression test for `processFunctionBeforeFrameFinalized`:
				; Check that it correctly updates RegisterScavenger so we
				; don't end up with bad machine code due to using undefined
				; physical registers.

				define void @test() {
				; CHECK-LABEL: test:
				; CHECK: ; %bb.0: ; %bb.0
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[4:5]
				; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: .LBB0_1: ; %bb.1
				; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: s_cbranch_scc1 .LBB0_3
				; CHECK-NEXT: ; %bb.2: ; %bb.2
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: .LBB0_3: ; %bb.3
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: v_accvgpr_read_b32 v0, a0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: ; implicit-def: $sgpr4
				; CHECK-NEXT: v_mov_b32_e32 v1, s4
				; CHECK-NEXT: v_readfirstlane_b32 s6, v1
				; CHECK-NEXT: s_mov_b64 s[4:5], -1
				; CHECK-NEXT: s_mov_b32 s7, 0
				; CHECK-NEXT: s_cmp_eq_u32 s6, s7
				; CHECK-NEXT: v_writelane_b32 v0, s4, 0
				; CHECK-NEXT: v_writelane_b32 v0, s5, 1
				; CHECK-NEXT: s_mov_b64 s[10:11], exec
				; CHECK-NEXT: s_mov_b64 exec, -1
				; CHECK-NEXT: v_accvgpr_write_b32 a0, v0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: s_cbranch_scc1 .LBB0_5
				; CHECK-NEXT: ; %bb.4: ; %bb.4
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: v_accvgpr_read_b32 v0, a0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: s_mov_b64 s[4:5], 0
				; CHECK-NEXT: v_writelane_b32 v0, s4, 0
				; CHECK-NEXT: v_writelane_b32 v0, s5, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: s_nop 0
				; CHECK-NEXT: v_accvgpr_write_b32 a0, v0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: .LBB0_5: ; %Flow
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: s_nop 0
				; CHECK-NEXT: v_accvgpr_read_b32 v0, a0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: v_readlane_b32 s4, v0, 0
				; CHECK-NEXT: v_readlane_b32 s5, v0, 1
				; CHECK-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; CHECK-NEXT: s_mov_b32 s4, 1
				; CHECK-NEXT: ; implicit-def: $sgpr5
				; CHECK-NEXT: v_cmp_ne_u32_e64 s[4:5], v0, s4
				; CHECK-NEXT: s_and_b64 vcc, exec, s[4:5]
				; CHECK-NEXT: s_cbranch_vccnz .LBB0_1
				; CHECK-NEXT: ; %bb.6: ; %bb.5
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: v_accvgpr_read_b32 v0, a0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: ; kill: killed $vgpr0
				; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[4:5]
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				bb.0:
				br label %bb.1
				bb.1: ; preds = %bb.4, %bb.0
				br i1 poison, label %bb.2, label %bb.3
				bb.2: ; preds = %bb.1
				br label %bb.3
				bb.3: ; preds = %bb.2, %bb.1
				%call = tail call i32 @llvm.amdgcn.readfirstlane(i32 poison)
				%cmp = icmp eq i32 %call, 0
				br i1 %cmp, label %bb.5, label %bb.4
				bb.4: ; preds = %bb.3
				br label %bb.1
				bb.5: ; preds = %bb.3
				ret void
				}

				declare i32 @llvm.amdgcn.readfirstlane(i32)

llvm/test/CodeGen/AMDGPU/spill-writelane-vgprs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s

	; Callee must preserve the VGPR modified by writelane even if it is marked Caller-saved.			; Callee must preserve the VGPR modified by writelane even if it is marked Caller-saved.

	declare i32 @llvm.amdgcn.writelane(i32, i32, i32)			declare i32 @llvm.amdgcn.writelane(i32, i32, i32)

	define void @sgpr_spill_writelane() {			define void @sgpr_spill_writelane() {
	; GCN-LABEL: sgpr_spill_writelane:			; GCN-LABEL: sgpr_spill_writelane:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v0, s35, 0			; GCN-NEXT: v_writelane_b32 v0, s35, 0
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s35, v0, 0			; GCN-NEXT: v_readlane_b32 s35, v0, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	call void asm sideeffect "", "~{s35}"()			call void asm sideeffect "", "~{s35}"()
	ret void			ret void
	}			}

	; FIXME: The writelane intrinsic doesn't really overwrite any inactive lanes			; FIXME: The writelane intrinsic doesn't really overwrite any inactive lanes
	; and hence there is no need to preserve the VGPR it modifies.			; and hence there is no need to preserve the VGPR it modifies.
	Show All 36 Lines

llvm/test/CodeGen/AMDGPU/spill192.mir

Show All 26 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9 = SI_SPILL_S192_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s192) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9 = SI_SPILL_S192_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s192) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-LABEL: name: spill_restore_sgpr192		; EXPANDED-LABEL: name: spill_restore_sgpr192
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr9, 5, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr9, 5, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_192		S_NOP 0, implicit-def %0:sgpr_192
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill224.mir

Show All 24 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10 = SI_SPILL_S224_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s224) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10 = SI_SPILL_S224_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s224) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-LABEL: name: spill_restore_sgpr224		; EXPANDED-LABEL: name: spill_restore_sgpr224
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr9, 5, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr9, 5, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 6, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 6, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4		; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5
; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 6
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_224		S_NOP 0, implicit-def %0:sgpr_224
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill288.mir

Show All 24 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12 = SI_SPILL_S288_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s288) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12 = SI_SPILL_S288_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s288) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
; EXPANDED-LABEL: name: spill_restore_sgpr288		; EXPANDED-LABEL: name: spill_restore_sgpr288
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr9, 5, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr9, 5, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr10, 6, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr10, 6, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr11, 7, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr11, 7, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr12, 8, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr12, 8, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4		; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5		; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 7
; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 6		; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 8
; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 $vgpr0, 7
; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 $vgpr0, 8
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_288		S_NOP 0, implicit-def %0:sgpr_288
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill320.mir

Show All 24 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13 = SI_SPILL_S320_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s320) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13 = SI_SPILL_S320_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s320) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13
; EXPANDED-LABEL: name: spill_restore_sgpr320		; EXPANDED-LABEL: name: spill_restore_sgpr320
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr9, 5, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr9, 5, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr10, 6, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr10, 6, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr11, 7, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr11, 7, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr12, 8, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr12, 8, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr13, 9, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr13, 9, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4		; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5		; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 7
; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 6		; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 8
; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 $vgpr0, 7		; EXPANDED-NEXT: $sgpr13 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 9
; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 $vgpr0, 8
; EXPANDED-NEXT: $sgpr13 = V_READLANE_B32 $vgpr0, 9
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_320		S_NOP 0, implicit-def %0:sgpr_320
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill352.mir

Show All 24 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14 = SI_SPILL_S352_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s352) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14 = SI_SPILL_S352_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s352) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14
; EXPANDED-LABEL: name: spill_restore_sgpr352		; EXPANDED-LABEL: name: spill_restore_sgpr352
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr9, 5, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr9, 5, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr10, 6, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr10, 6, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr11, 7, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr11, 7, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr12, 8, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr12, 8, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr13, 9, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr13, 9, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr14, 10, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr14, 10, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4		; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5		; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 7
; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 6		; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 8
; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 $vgpr0, 7		; EXPANDED-NEXT: $sgpr13 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 9
; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 $vgpr0, 8		; EXPANDED-NEXT: $sgpr14 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 10
; EXPANDED-NEXT: $sgpr13 = V_READLANE_B32 $vgpr0, 9
; EXPANDED-NEXT: $sgpr14 = V_READLANE_B32 $vgpr0, 10
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_352		S_NOP 0, implicit-def %0:sgpr_352
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill384.mir

Show All 24 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15 = SI_SPILL_S384_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s384) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15 = SI_SPILL_S384_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s384) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
; EXPANDED-LABEL: name: spill_restore_sgpr384		; EXPANDED-LABEL: name: spill_restore_sgpr384
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr9, 5, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr9, 5, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr10, 6, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr10, 6, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr11, 7, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr11, 7, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr12, 8, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr12, 8, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr13, 9, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr13, 9, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr14, 10, $vgpr0		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr14, 10, [[V_WRITELANE_B32_1]]
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr15, 11, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr15, 11, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4		; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5		; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 7
; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 6		; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 8
; EXPANDED-NEXT: $sgpr11 = V_READLANE_B32 $vgpr0, 7		; EXPANDED-NEXT: $sgpr13 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 9
; EXPANDED-NEXT: $sgpr12 = V_READLANE_B32 $vgpr0, 8		; EXPANDED-NEXT: $sgpr14 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 10
; EXPANDED-NEXT: $sgpr13 = V_READLANE_B32 $vgpr0, 9		; EXPANDED-NEXT: $sgpr15 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 11
; EXPANDED-NEXT: $sgpr14 = V_READLANE_B32 $vgpr0, 10
; EXPANDED-NEXT: $sgpr15 = V_READLANE_B32 $vgpr0, 11
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_384		S_NOP 0, implicit-def %0:sgpr_384
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-realign.ll

	Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	; The BP value is saved/restored with a VGPR spill.			; The BP value is saved/restored with a VGPR spill.

	; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:			; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:
	; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33			; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
	; GCN-NEXT: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0			; GCN-NEXT: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0
	; GCN-NEXT: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000			; GCN-NEXT: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s33 offset:1028 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s33 offset:1028 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword [[VGPR_REG_1:v[0-9]+]], off, s[0:3], s33 offset:1032 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
	; GCN-NEXT: v_mov_b32_e32 v32, 0			; GCN-NEXT: v_mov_b32_e32 v32, 0
	; GCN-DAG: v_writelane_b32 [[VGPR_REG_1]], s34, 1			; GCN-DAG: v_writelane_b32 [[VGPR_REG]], s34, 3
	; GCN: s_mov_b32 s34, s32			; GCN: s_mov_b32 s34, s32
	; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024			; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34			; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34
	; GCN-DAG: s_add_i32 s32, s32, 0x30000			; GCN-DAG: s_add_i32 s32, s32, 0x30000
	; GCN: v_writelane_b32 [[VGPR_REG_1]], [[FP_SCRATCH_COPY]], 0			; GCN: v_writelane_b32 [[VGPR_REG]], [[FP_SCRATCH_COPY]], 2
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32
	; GCN: s_swappc_b64 s[30:31],			; GCN: s_swappc_b64 s[30:31],

	; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1			; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1
	; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0			; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0
	; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG_1]], 1			; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG]], 3
	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[VGPR_REG_1]], 0			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[VGPR_REG]], 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s33 offset:1028 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s33 offset:1028 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword [[VGPR_REG_1]], off, s[0:3], s33 offset:1032 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_add_i32 s32, s32, 0xfffd0000			; GCN-NEXT: s_add_i32 s32, s32, 0xfffd0000
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN: s_setpc_b64 s[30:31]			; GCN: s_setpc_b64 s[30:31]
	%temp = alloca i32, align 1024, addrspace(5)			%temp = alloca i32, align 1024, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %temp, align 1024			store volatile i32 0, ptr addrspace(5) %temp, align 1024
	call void @extern_func(<32 x i32> %a, i32 %b)			call void @extern_func(<32 x i32> %a, i32 %b)
	ret void			ret void
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/swdev380865.ll

	Show All 10 Lines
	; spills inside the loop, so we would repeatedly reload the same			; spills inside the loop, so we would repeatedly reload the same
	; values.			; values.

	define amdgpu_kernel void @_Z6kernelILi4000ELi1EEvPd(ptr addrspace(1) %x.coerce) {			define amdgpu_kernel void @_Z6kernelILi4000ELi1EEvPd(ptr addrspace(1) %x.coerce) {
	; CHECK-LABEL: _Z6kernelILi4000ELi1EEvPd:			; CHECK-LABEL: _Z6kernelILi4000ELi1EEvPd:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_mov_b64 s[0:1], 0			; CHECK-NEXT: s_mov_b64 s[0:1], 0
	; CHECK-NEXT: s_load_dword s2, s[0:1], 0x0			; CHECK-NEXT: s_load_dword s2, s[0:1], 0x0
				; CHECK-NEXT: ; implicit-def: $vgpr2
	; CHECK-NEXT: ; kill: killed $sgpr0_sgpr1			; CHECK-NEXT: ; kill: killed $sgpr0_sgpr1
	; CHECK-NEXT: s_mov_b32 s7, 0x401c0000			; CHECK-NEXT: s_mov_b32 s7, 0x401c0000
	; CHECK-NEXT: s_mov_b32 s5, 0x40280000			; CHECK-NEXT: s_mov_b32 s5, 0x40280000
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: v_writelane_b32 v0, s2, 0			; CHECK-NEXT: v_writelane_b32 v2, s2, 0
	; CHECK-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0			; CHECK-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s1, 0x40180000			; CHECK-NEXT: s_mov_b32 s1, 0x40180000
	; CHECK-NEXT: v_writelane_b32 v0, s0, 1			; CHECK-NEXT: v_writelane_b32 v2, s0, 1
	; CHECK-NEXT: v_writelane_b32 v0, s1, 2			; CHECK-NEXT: v_writelane_b32 v2, s1, 2
	; CHECK-NEXT: s_mov_b32 s1, 0x40220000			; CHECK-NEXT: s_mov_b32 s1, 0x40220000
	; CHECK-NEXT: v_writelane_b32 v0, s0, 3			; CHECK-NEXT: v_writelane_b32 v2, s0, 3
	; CHECK-NEXT: v_writelane_b32 v0, s1, 4			; CHECK-NEXT: v_writelane_b32 v2, s1, 4
	; CHECK-NEXT: s_mov_b32 s1, 0x40240000			; CHECK-NEXT: s_mov_b32 s1, 0x40240000
	; CHECK-NEXT: v_writelane_b32 v0, s0, 5			; CHECK-NEXT: v_writelane_b32 v2, s0, 5
	; CHECK-NEXT: v_writelane_b32 v0, s1, 6			; CHECK-NEXT: v_writelane_b32 v2, s1, 6
	; CHECK-NEXT: s_mov_b32 s1, 0x40260000			; CHECK-NEXT: s_mov_b32 s1, 0x40260000
	; CHECK-NEXT: v_writelane_b32 v0, s0, 7			; CHECK-NEXT: v_writelane_b32 v2, s0, 7
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v1, s2			; CHECK-NEXT: v_mov_b32_e32 v0, s2
	; CHECK-NEXT: v_writelane_b32 v0, s1, 8			; CHECK-NEXT: v_writelane_b32 v2, s1, 8
	; CHECK-NEXT: v_mov_b32_e32 v2, s3			; CHECK-NEXT: v_mov_b32_e32 v1, s3
	; CHECK-NEXT: .LBB0_1: ; %for.cond4.preheader			; CHECK-NEXT: .LBB0_1: ; %for.cond4.preheader
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], 0			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], 0
	; CHECK-NEXT: s_mov_b32 s2, 0			; CHECK-NEXT: s_mov_b32 s2, 0
	; CHECK-NEXT: s_mov_b32 s3, 0x40140000			; CHECK-NEXT: s_mov_b32 s3, 0x40140000
	; CHECK-NEXT: v_writelane_b32 v0, s6, 9			; CHECK-NEXT: v_writelane_b32 v2, s6, 9
	; CHECK-NEXT: v_writelane_b32 v0, s7, 10			; CHECK-NEXT: v_writelane_b32 v2, s7, 10
	; CHECK-NEXT: v_writelane_b32 v0, s0, 11			; CHECK-NEXT: v_writelane_b32 v2, s0, 11
	; CHECK-NEXT: v_readlane_b32 s6, v0, 1			; CHECK-NEXT: v_readlane_b32 s6, v2, 1
	; CHECK-NEXT: v_readlane_b32 s7, v0, 2			; CHECK-NEXT: v_readlane_b32 s7, v2, 2
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], s[2:3]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: s_mov_b32 s1, s7			; CHECK-NEXT: s_mov_b32 s1, s7
	; CHECK-NEXT: s_mov_b32 s0, s2			; CHECK-NEXT: s_mov_b32 s0, s2
	; CHECK-NEXT: v_writelane_b32 v0, s6, 1			; CHECK-NEXT: v_writelane_b32 v2, s6, 1
	; CHECK-NEXT: v_writelane_b32 v0, s7, 2			; CHECK-NEXT: v_writelane_b32 v2, s7, 2
	; CHECK-NEXT: v_readlane_b32 s6, v0, 9			; CHECK-NEXT: v_readlane_b32 s6, v2, 9
	; CHECK-NEXT: v_readlane_b32 s7, v0, 10			; CHECK-NEXT: v_readlane_b32 s7, v2, 10
	; CHECK-NEXT: s_mov_b32 s6, s2			; CHECK-NEXT: s_mov_b32 s6, s2
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], s[0:1]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[0:1]
	; CHECK-NEXT: v_readlane_b32 s0, v0, 3			; CHECK-NEXT: v_readlane_b32 s0, v2, 3
	; CHECK-NEXT: v_readlane_b32 s1, v0, 4			; CHECK-NEXT: v_readlane_b32 s1, v2, 4
	; CHECK-NEXT: s_mov_b32 s3, s1			; CHECK-NEXT: s_mov_b32 s3, s1
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s2, s0			; CHECK-NEXT: s_mov_b32 s2, s0
	; CHECK-NEXT: s_mov_b32 s1, s3			; CHECK-NEXT: s_mov_b32 s1, s3
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], s[6:7]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[6:7]
	; CHECK-NEXT: v_writelane_b32 v0, s0, 3			; CHECK-NEXT: v_writelane_b32 v2, s0, 3
	; CHECK-NEXT: v_writelane_b32 v0, s1, 4			; CHECK-NEXT: v_writelane_b32 v2, s1, 4
	; CHECK-NEXT: v_readlane_b32 s0, v0, 5			; CHECK-NEXT: v_readlane_b32 s0, v2, 5
	; CHECK-NEXT: v_readlane_b32 s1, v0, 6			; CHECK-NEXT: v_readlane_b32 s1, v2, 6
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], s[2:3]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: s_mov_b32 s3, s1			; CHECK-NEXT: s_mov_b32 s3, s1
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s2, s0			; CHECK-NEXT: s_mov_b32 s2, s0
	; CHECK-NEXT: s_mov_b32 s1, s3			; CHECK-NEXT: s_mov_b32 s1, s3
	; CHECK-NEXT: v_writelane_b32 v0, s0, 5			; CHECK-NEXT: v_writelane_b32 v2, s0, 5
	; CHECK-NEXT: v_writelane_b32 v0, s1, 6			; CHECK-NEXT: v_writelane_b32 v2, s1, 6
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], s[2:3]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: v_readlane_b32 s0, v0, 7			; CHECK-NEXT: v_readlane_b32 s0, v2, 7
	; CHECK-NEXT: v_readlane_b32 s1, v0, 8			; CHECK-NEXT: v_readlane_b32 s1, v2, 8
	; CHECK-NEXT: s_mov_b32 s3, s1			; CHECK-NEXT: s_mov_b32 s3, s1
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s2, s0			; CHECK-NEXT: s_mov_b32 s2, s0
	; CHECK-NEXT: s_mov_b32 s1, s3			; CHECK-NEXT: s_mov_b32 s1, s3
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], s[2:3]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: v_writelane_b32 v0, s0, 7			; CHECK-NEXT: v_writelane_b32 v2, s0, 7
	; CHECK-NEXT: v_writelane_b32 v0, s1, 8			; CHECK-NEXT: v_writelane_b32 v2, s1, 8
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s4, s0			; CHECK-NEXT: s_mov_b32 s4, s0
	; CHECK-NEXT: v_readlane_b32 s0, v0, 0			; CHECK-NEXT: v_readlane_b32 s0, v2, 0
	; CHECK-NEXT: v_readlane_b32 s2, v0, 11			; CHECK-NEXT: v_readlane_b32 s2, v2, 11
	; CHECK-NEXT: v_add_f64 v[1:2], v[1:2], s[4:5]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[4:5]
	; CHECK-NEXT: s_add_i32 s2, s2, s0			; CHECK-NEXT: s_add_i32 s2, s2, s0
	; CHECK-NEXT: v_writelane_b32 v0, s2, 11			; CHECK-NEXT: v_writelane_b32 v2, s2, 11
	; CHECK-NEXT: v_readlane_b32 s0, v0, 11			; CHECK-NEXT: v_readlane_b32 s0, v2, 11
	; CHECK-NEXT: s_cmpk_lt_i32 s0, 0xa00			; CHECK-NEXT: s_cmpk_lt_i32 s0, 0xa00
	; CHECK-NEXT: s_cbranch_scc1 .LBB0_1			; CHECK-NEXT: s_cbranch_scc1 .LBB0_1
	; CHECK-NEXT: ; %bb.2: ; %for.cond.cleanup.loopexit			; CHECK-NEXT: ; %bb.2: ; %for.cond.cleanup.loopexit
	; CHECK-NEXT: v_mov_b32_e32 v3, 0			; CHECK-NEXT: v_mov_b32_e32 v3, 0
	; CHECK-NEXT: v_mov_b32_e32 v4, 0			; CHECK-NEXT: v_mov_b32_e32 v4, 0
	; CHECK-NEXT: global_store_dwordx2 v[3:4], v[1:2], off			; CHECK-NEXT: global_store_dwordx2 v[3:4], v[0:1], off
				; CHECK-NEXT: ; kill: killed $vgpr2
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	entry:			entry:
	%0 = load i32, ptr addrspace(4) null, align 4			%0 = load i32, ptr addrspace(4) null, align 4
	%cmp6 = icmp slt i32 0, 2560			%cmp6 = icmp slt i32 0, 2560
	br i1 %cmp6, label %for.cond4.preheader, label %for.cond.cleanup			br i1 %cmp6, label %for.cond4.preheader, label %for.cond.cleanup

	for.cond4.preheader: ; preds = %for.cond4.preheader, %entry			for.cond4.preheader: ; preds = %for.cond4.preheader, %entry
	%idx.07 = phi i32 [ %add13, %for.cond4.preheader ], [ 0, %entry ]			%idx.07 = phi i32 [ %add13, %for.cond4.preheader ], [ 0, %entry ]
	Show All 23 Lines

llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=1 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS1 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=1 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS1 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=0 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS0 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=0 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS0 %s

	declare void @wobble()			declare void @wobble()

	define internal fastcc void @widget() {			define internal fastcc void @widget() {
	; GFX90A-LABEL: widget:			; GFX90A-LABEL: widget:
	; GFX90A: ; %bb.0: ; %bb			; GFX90A: ; %bb.0: ; %bb
	; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX90A-NEXT: s_mov_b32 s16, s33			; GFX90A-NEXT: s_mov_b32 s16, s33
	; GFX90A-NEXT: s_mov_b32 s33, s32			; GFX90A-NEXT: s_mov_b32 s33, s32
	; GFX90A-NEXT: s_or_saveexec_b64 s[18:19], -1			; GFX90A-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GFX90A-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX90A-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX90A-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX90A-NEXT: s_mov_b64 exec, s[18:19]			; GFX90A-NEXT: s_mov_b64 exec, s[18:19]
	; GFX90A-NEXT: s_addk_i32 s32, 0x400			; GFX90A-NEXT: s_addk_i32 s32, 0x400
	; GFX90A-NEXT: v_writelane_b32 v41, s16, 0			; GFX90A-NEXT: v_writelane_b32 v40, s16, 2
	; GFX90A-NEXT: s_getpc_b64 s[16:17]			; GFX90A-NEXT: s_getpc_b64 s[16:17]
	; GFX90A-NEXT: s_add_u32 s16, s16, wobble@gotpcrel32@lo+4			; GFX90A-NEXT: s_add_u32 s16, s16, wobble@gotpcrel32@lo+4
	; GFX90A-NEXT: s_addc_u32 s17, s17, wobble@gotpcrel32@hi+12			; GFX90A-NEXT: s_addc_u32 s17, s17, wobble@gotpcrel32@hi+12
	; GFX90A-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; GFX90A-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; GFX90A-NEXT: v_writelane_b32 v40, s30, 0			; GFX90A-NEXT: v_writelane_b32 v40, s30, 0
	; GFX90A-NEXT: v_writelane_b32 v40, s31, 1			; GFX90A-NEXT: v_writelane_b32 v40, s31, 1
	; GFX90A-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GFX90A-NEXT: s_swappc_b64 s[30:31], s[16:17]
	bb:			bb:
	tail call void @wobble()			tail call void @wobble()
	unreachable			unreachable
	}			}

	define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i32 %tmp5.i.i, i32 %tmp427.i, i1 %tmp438.i, double %tmp27.i, i1 %tmp48.i) {			define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i32 %tmp5.i.i, i32 %tmp427.i, i1 %tmp438.i, double %tmp27.i, i1 %tmp48.i) {
	; GLOBALNESS1-LABEL: kernel:			; GLOBALNESS1-LABEL: kernel:
	; GLOBALNESS1: ; %bb.0: ; %bb			; GLOBALNESS1: ; %bb.0: ; %bb
	; GLOBALNESS1-NEXT: s_mov_b64 s[36:37], s[6:7]			; GLOBALNESS1-NEXT: s_mov_b64 s[36:37], s[6:7]
	; GLOBALNESS1-NEXT: s_load_dwordx4 s[76:79], s[8:9], 0x0			; GLOBALNESS1-NEXT: s_load_dwordx4 s[76:79], s[8:9], 0x0
	; GLOBALNESS1-NEXT: s_load_dword s6, s[8:9], 0x14			; GLOBALNESS1-NEXT: s_load_dword s6, s[8:9], 0x14
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v42, v0			; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v0
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v40, 0			; GLOBALNESS1-NEXT: v_mov_b32_e32 v42, 0
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], 0, 0
	; GLOBALNESS1-NEXT: global_store_dword v[0:1], v40, off			; GLOBALNESS1-NEXT: global_store_dword v[0:1], v42, off
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: global_load_dword v0, v40, s[76:77]			; GLOBALNESS1-NEXT: global_load_dword v0, v42, s[76:77]
	; GLOBALNESS1-NEXT: s_mov_b64 s[40:41], s[4:5]			; GLOBALNESS1-NEXT: s_mov_b64 s[40:41], s[4:5]
	; GLOBALNESS1-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18			; GLOBALNESS1-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18
	; GLOBALNESS1-NEXT: s_load_dword s7, s[8:9], 0x20			; GLOBALNESS1-NEXT: s_load_dword s7, s[8:9], 0x20
	; GLOBALNESS1-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; GLOBALNESS1-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; GLOBALNESS1-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; GLOBALNESS1-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; GLOBALNESS1-NEXT: s_add_u32 s0, s0, s17			; GLOBALNESS1-NEXT: s_add_u32 s0, s0, s17
	; GLOBALNESS1-NEXT: s_addc_u32 s1, s1, 0			; GLOBALNESS1-NEXT: s_addc_u32 s1, s1, 0
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, 0x40994400			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, 0x40994400
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s78, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s78, 0
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e32 vcc, s[4:5], v[40:41]			; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e32 vcc, s[4:5], v[42:43]
	; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[4:5], s[4:5], 0			; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[4:5], s[4:5], 0
	; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]			; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]
	; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v3, 0, 1, s[4:5]			; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v3, 0, 1, s[4:5]
	; GLOBALNESS1-NEXT: s_xor_b64 s[4:5], s[4:5], -1			; GLOBALNESS1-NEXT: s_xor_b64 s[4:5], s[4:5], -1
	; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc			; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s6, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s6, 0
	; GLOBALNESS1-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1			; GLOBALNESS1-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[44:45], v[0:1], v[0:1] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[44:45], v[0:1], v[0:1] op_sel:[0,1]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_30			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_30
	; GLOBALNESS1-NEXT: .LBB1_4: ; %bb5			; GLOBALNESS1-NEXT: .LBB1_4: ; %bb5
	; GLOBALNESS1-NEXT: ; =>This Loop Header: Depth=1			; GLOBALNESS1-NEXT: ; =>This Loop Header: Depth=1
	; GLOBALNESS1-NEXT: ; Child Loop BB1_15 Depth 2			; GLOBALNESS1-NEXT: ; Child Loop BB1_15 Depth 2
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], s[74:75], s[74:75] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], s[74:75], s[74:75] op_sel:[0,1]
	; GLOBALNESS1-NEXT: flat_load_dword v43, v[0:1]			; GLOBALNESS1-NEXT: flat_load_dword v40, v[0:1]
	; GLOBALNESS1-NEXT: s_add_u32 s8, s38, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s38, 40
	; GLOBALNESS1-NEXT: buffer_store_dword v40, off, s[0:3], 0			; GLOBALNESS1-NEXT: buffer_store_dword v42, off, s[0:3], 0
	; GLOBALNESS1-NEXT: flat_load_dword v46, v[0:1]			; GLOBALNESS1-NEXT: flat_load_dword v46, v[0:1]
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s39, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s39, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s72			; GLOBALNESS1-NEXT: s_mov_b32 s12, s72
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s71			; GLOBALNESS1-NEXT: s_mov_b32 s13, s71
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s70			; GLOBALNESS1-NEXT: s_mov_b32 s14, s70
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[76:77]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[76:77]
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[46:47]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[46:47]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_8			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_8
	; GLOBALNESS1-NEXT: ; %bb.5: ; %NodeBlock			; GLOBALNESS1-NEXT: ; %bb.5: ; %NodeBlock
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	Show All 26 Lines
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_26			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_26
	; GLOBALNESS1-NEXT: ; %bb.10: ; %bb33.i			; GLOBALNESS1-NEXT: ; %bb.10: ; %bb33.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: global_load_dwordx2 v[0:1], v[2:3], off			; GLOBALNESS1-NEXT: global_load_dwordx2 v[0:1], v[2:3], off
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[54:55]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[54:55]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_12			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_12
	; GLOBALNESS1-NEXT: ; %bb.11: ; %bb39.i			; GLOBALNESS1-NEXT: ; %bb.11: ; %bb39.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS1-NEXT: .LBB1_12: ; %bb44.lr.ph.i			; GLOBALNESS1-NEXT: .LBB1_12: ; %bb44.lr.ph.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_cmp_ne_u32_e32 vcc, 0, v46			; GLOBALNESS1-NEXT: v_cmp_ne_u32_e32 vcc, 0, v46
	; GLOBALNESS1-NEXT: v_cndmask_b32_e32 v2, 0, v43, vcc			; GLOBALNESS1-NEXT: v_cndmask_b32_e32 v2, 0, v40, vcc
	; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_nlt_f64_e64 s[64:65], 0, v[0:1]			; GLOBALNESS1-NEXT: v_cmp_nlt_f64_e64 s[64:65], 0, v[0:1]
	; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[66:67], 0, v2			; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[66:67], 0, v2
	; GLOBALNESS1-NEXT: s_branch .LBB1_15			; GLOBALNESS1-NEXT: s_branch .LBB1_15
	; GLOBALNESS1-NEXT: .LBB1_13: ; %Flow16			; GLOBALNESS1-NEXT: .LBB1_13: ; %Flow16
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS1-NEXT: .LBB1_14: ; %bb63.i			; GLOBALNESS1-NEXT: .LBB1_14: ; %bb63.i
	Show All 30 Lines
	; GLOBALNESS1-NEXT: s_addc_u32 s69, s39, 0			; GLOBALNESS1-NEXT: s_addc_u32 s69, s39, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[68:69]			; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[68:69]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s72			; GLOBALNESS1-NEXT: s_mov_b32 s12, s72
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s71			; GLOBALNESS1-NEXT: s_mov_b32 s13, s71
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s70			; GLOBALNESS1-NEXT: s_mov_b32 s14, s70
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[76:77]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[76:77]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[46:47], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[46:47], 0, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[68:69]			; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[68:69]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s72			; GLOBALNESS1-NEXT: s_mov_b32 s12, s72
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s71			; GLOBALNESS1-NEXT: s_mov_b32 s13, s71
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s70			; GLOBALNESS1-NEXT: s_mov_b32 s14, s70
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[46:47], v[44:45], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[46:47], v[44:45], off
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[76:77]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[76:77]
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[66:67]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[66:67]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_13			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_13
	; GLOBALNESS1-NEXT: ; %bb.22: ; %bb62.i			; GLOBALNESS1-NEXT: ; %bb.22: ; %bb62.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[46:47], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[46:47], v[42:43], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_13			; GLOBALNESS1-NEXT: s_branch .LBB1_13
	; GLOBALNESS1-NEXT: .LBB1_23: ; %LeafBlock			; GLOBALNESS1-NEXT: .LBB1_23: ; %LeafBlock
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_cmp_lg_u32 s79, 0			; GLOBALNESS1-NEXT: s_cmp_lg_u32 s79, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS1-NEXT: s_cselect_b64 s[6:7], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[6:7], -1, 0
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_9			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_9
	Show All 10 Lines
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[62:63]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[62:63]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_2			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_2
	; GLOBALNESS1-NEXT: ; %bb.27: ; %bb67.i			; GLOBALNESS1-NEXT: ; %bb.27: ; %bb67.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[58:59]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[58:59]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_1			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_1
	; GLOBALNESS1-NEXT: ; %bb.28: ; %bb69.i			; GLOBALNESS1-NEXT: ; %bb.28: ; %bb69.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_1			; GLOBALNESS1-NEXT: s_branch .LBB1_1
	; GLOBALNESS1-NEXT: .LBB1_29: ; %bb73.i			; GLOBALNESS1-NEXT: .LBB1_29: ; %bb73.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_2			; GLOBALNESS1-NEXT: s_branch .LBB1_2
	; GLOBALNESS1-NEXT: .LBB1_30: ; %loop.exit.guard			; GLOBALNESS1-NEXT: .LBB1_30: ; %loop.exit.guard
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], -1			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], -1
	; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_32			; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_32
	; GLOBALNESS1-NEXT: ; %bb.31: ; %bb7.i.i			; GLOBALNESS1-NEXT: ; %bb.31: ; %bb7.i.i
	; GLOBALNESS1-NEXT: s_add_u32 s8, s38, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s38, 40
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s39, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s39, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s72			; GLOBALNESS1-NEXT: s_mov_b32 s12, s72
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s71			; GLOBALNESS1-NEXT: s_mov_b32 s13, s71
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s70			; GLOBALNESS1-NEXT: s_mov_b32 s14, s70
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS1-NEXT: .LBB1_32: ; %Flow			; GLOBALNESS1-NEXT: .LBB1_32: ; %Flow
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_34			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_34
	; GLOBALNESS1-NEXT: ; %bb.33: ; %bb11.i.i			; GLOBALNESS1-NEXT: ; %bb.33: ; %bb11.i.i
	; GLOBALNESS1-NEXT: s_add_u32 s8, s38, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s38, 40
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s39, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s39, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s72			; GLOBALNESS1-NEXT: s_mov_b32 s12, s72
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s71			; GLOBALNESS1-NEXT: s_mov_b32 s13, s71
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s70			; GLOBALNESS1-NEXT: s_mov_b32 s14, s70
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS1-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock			; GLOBALNESS1-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock
	;			;
	; GLOBALNESS0-LABEL: kernel:			; GLOBALNESS0-LABEL: kernel:
	; GLOBALNESS0: ; %bb.0: ; %bb			; GLOBALNESS0: ; %bb.0: ; %bb
	; GLOBALNESS0-NEXT: s_mov_b64 s[36:37], s[6:7]			; GLOBALNESS0-NEXT: s_mov_b64 s[36:37], s[6:7]
	; GLOBALNESS0-NEXT: s_load_dwordx4 s[72:75], s[8:9], 0x0			; GLOBALNESS0-NEXT: s_load_dwordx4 s[72:75], s[8:9], 0x0
	; GLOBALNESS0-NEXT: s_load_dword s6, s[8:9], 0x14			; GLOBALNESS0-NEXT: s_load_dword s6, s[8:9], 0x14
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v42, v0			; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v0
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v40, 0			; GLOBALNESS0-NEXT: v_mov_b32_e32 v42, 0
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], 0, 0
	; GLOBALNESS0-NEXT: global_store_dword v[0:1], v40, off			; GLOBALNESS0-NEXT: global_store_dword v[0:1], v42, off
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: global_load_dword v0, v40, s[72:73]			; GLOBALNESS0-NEXT: global_load_dword v0, v42, s[72:73]
	; GLOBALNESS0-NEXT: s_mov_b64 s[40:41], s[4:5]			; GLOBALNESS0-NEXT: s_mov_b64 s[40:41], s[4:5]
	; GLOBALNESS0-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18			; GLOBALNESS0-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18
	; GLOBALNESS0-NEXT: s_load_dword s7, s[8:9], 0x20			; GLOBALNESS0-NEXT: s_load_dword s7, s[8:9], 0x20
	; GLOBALNESS0-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; GLOBALNESS0-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; GLOBALNESS0-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; GLOBALNESS0-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; GLOBALNESS0-NEXT: s_add_u32 s0, s0, s17			; GLOBALNESS0-NEXT: s_add_u32 s0, s0, s17
	; GLOBALNESS0-NEXT: s_addc_u32 s1, s1, 0			; GLOBALNESS0-NEXT: s_addc_u32 s1, s1, 0
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, 0x40994400			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, 0x40994400
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s74, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s74, 0
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e32 vcc, s[4:5], v[40:41]			; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e32 vcc, s[4:5], v[42:43]
	; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[4:5], s[4:5], 0			; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[4:5], s[4:5], 0
	; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]			; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[4:5]
	; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v3, 0, 1, s[4:5]			; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v3, 0, 1, s[4:5]
	; GLOBALNESS0-NEXT: s_xor_b64 s[4:5], s[4:5], -1			; GLOBALNESS0-NEXT: s_xor_b64 s[4:5], s[4:5], -1
	; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc			; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s6, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s6, 0
	; GLOBALNESS0-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1			; GLOBALNESS0-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[44:45], v[0:1], v[0:1] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[44:45], v[0:1], v[0:1] op_sel:[0,1]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_30			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_30
	; GLOBALNESS0-NEXT: .LBB1_4: ; %bb5			; GLOBALNESS0-NEXT: .LBB1_4: ; %bb5
	; GLOBALNESS0-NEXT: ; =>This Loop Header: Depth=1			; GLOBALNESS0-NEXT: ; =>This Loop Header: Depth=1
	; GLOBALNESS0-NEXT: ; Child Loop BB1_15 Depth 2			; GLOBALNESS0-NEXT: ; Child Loop BB1_15 Depth 2
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], s[76:77], s[76:77] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], s[76:77], s[76:77] op_sel:[0,1]
	; GLOBALNESS0-NEXT: flat_load_dword v43, v[0:1]			; GLOBALNESS0-NEXT: flat_load_dword v40, v[0:1]
	; GLOBALNESS0-NEXT: s_add_u32 s8, s38, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s38, 40
	; GLOBALNESS0-NEXT: buffer_store_dword v40, off, s[0:3], 0			; GLOBALNESS0-NEXT: buffer_store_dword v42, off, s[0:3], 0
	; GLOBALNESS0-NEXT: flat_load_dword v46, v[0:1]			; GLOBALNESS0-NEXT: flat_load_dword v46, v[0:1]
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s39, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s39, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s70			; GLOBALNESS0-NEXT: s_mov_b32 s12, s70
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s69			; GLOBALNESS0-NEXT: s_mov_b32 s13, s69
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s68			; GLOBALNESS0-NEXT: s_mov_b32 s14, s68
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[78:79]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[78:79]
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[46:47]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[46:47]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_8			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_8
	; GLOBALNESS0-NEXT: ; %bb.5: ; %NodeBlock			; GLOBALNESS0-NEXT: ; %bb.5: ; %NodeBlock
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	Show All 26 Lines
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_26			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_26
	; GLOBALNESS0-NEXT: ; %bb.10: ; %bb33.i			; GLOBALNESS0-NEXT: ; %bb.10: ; %bb33.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: global_load_dwordx2 v[0:1], v[2:3], off			; GLOBALNESS0-NEXT: global_load_dwordx2 v[0:1], v[2:3], off
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[54:55]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[54:55]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_12			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_12
	; GLOBALNESS0-NEXT: ; %bb.11: ; %bb39.i			; GLOBALNESS0-NEXT: ; %bb.11: ; %bb39.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS0-NEXT: .LBB1_12: ; %bb44.lr.ph.i			; GLOBALNESS0-NEXT: .LBB1_12: ; %bb44.lr.ph.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_cmp_ne_u32_e32 vcc, 0, v46			; GLOBALNESS0-NEXT: v_cmp_ne_u32_e32 vcc, 0, v46
	; GLOBALNESS0-NEXT: v_cndmask_b32_e32 v2, 0, v43, vcc			; GLOBALNESS0-NEXT: v_cndmask_b32_e32 v2, 0, v40, vcc
	; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_nlt_f64_e64 s[64:65], 0, v[0:1]			; GLOBALNESS0-NEXT: v_cmp_nlt_f64_e64 s[64:65], 0, v[0:1]
	; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[66:67], 0, v2			; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[66:67], 0, v2
	; GLOBALNESS0-NEXT: s_branch .LBB1_15			; GLOBALNESS0-NEXT: s_branch .LBB1_15
	; GLOBALNESS0-NEXT: .LBB1_13: ; %Flow16			; GLOBALNESS0-NEXT: .LBB1_13: ; %Flow16
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS0-NEXT: .LBB1_14: ; %bb63.i			; GLOBALNESS0-NEXT: .LBB1_14: ; %bb63.i
	Show All 30 Lines
	; GLOBALNESS0-NEXT: s_addc_u32 s73, s39, 0			; GLOBALNESS0-NEXT: s_addc_u32 s73, s39, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[72:73]			; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[72:73]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s70			; GLOBALNESS0-NEXT: s_mov_b32 s12, s70
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s69			; GLOBALNESS0-NEXT: s_mov_b32 s13, s69
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s68			; GLOBALNESS0-NEXT: s_mov_b32 s14, s68
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[78:79]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[78:79]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[46:47], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[46:47], 0, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[72:73]			; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[72:73]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s70			; GLOBALNESS0-NEXT: s_mov_b32 s12, s70
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s69			; GLOBALNESS0-NEXT: s_mov_b32 s13, s69
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s68			; GLOBALNESS0-NEXT: s_mov_b32 s14, s68
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[46:47], v[44:45], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[46:47], v[44:45], off
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[78:79]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[78:79]
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[66:67]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[66:67]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_13			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_13
	; GLOBALNESS0-NEXT: ; %bb.22: ; %bb62.i			; GLOBALNESS0-NEXT: ; %bb.22: ; %bb62.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[46:47], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[46:47], v[42:43], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_13			; GLOBALNESS0-NEXT: s_branch .LBB1_13
	; GLOBALNESS0-NEXT: .LBB1_23: ; %LeafBlock			; GLOBALNESS0-NEXT: .LBB1_23: ; %LeafBlock
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_cmp_lg_u32 s75, 0			; GLOBALNESS0-NEXT: s_cmp_lg_u32 s75, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS0-NEXT: s_cselect_b64 s[6:7], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[6:7], -1, 0
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_9			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_9
	Show All 10 Lines
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_2			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_2
	; GLOBALNESS0-NEXT: ; %bb.27: ; %bb67.i			; GLOBALNESS0-NEXT: ; %bb.27: ; %bb67.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[58:59]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[58:59]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_1			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_1
	; GLOBALNESS0-NEXT: ; %bb.28: ; %bb69.i			; GLOBALNESS0-NEXT: ; %bb.28: ; %bb69.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_1			; GLOBALNESS0-NEXT: s_branch .LBB1_1
	; GLOBALNESS0-NEXT: .LBB1_29: ; %bb73.i			; GLOBALNESS0-NEXT: .LBB1_29: ; %bb73.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_2			; GLOBALNESS0-NEXT: s_branch .LBB1_2
	; GLOBALNESS0-NEXT: .LBB1_30: ; %loop.exit.guard			; GLOBALNESS0-NEXT: .LBB1_30: ; %loop.exit.guard
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], -1			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], -1
	; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_32			; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_32
	; GLOBALNESS0-NEXT: ; %bb.31: ; %bb7.i.i			; GLOBALNESS0-NEXT: ; %bb.31: ; %bb7.i.i
	; GLOBALNESS0-NEXT: s_add_u32 s8, s38, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s38, 40
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s39, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s39, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s70			; GLOBALNESS0-NEXT: s_mov_b32 s12, s70
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s69			; GLOBALNESS0-NEXT: s_mov_b32 s13, s69
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s68			; GLOBALNESS0-NEXT: s_mov_b32 s14, s68
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS0-NEXT: .LBB1_32: ; %Flow			; GLOBALNESS0-NEXT: .LBB1_32: ; %Flow
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_34			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_34
	; GLOBALNESS0-NEXT: ; %bb.33: ; %bb11.i.i			; GLOBALNESS0-NEXT: ; %bb.33: ; %bb11.i.i
	; GLOBALNESS0-NEXT: s_add_u32 s8, s38, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s38, 40
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s39, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s39, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[36:37]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s70			; GLOBALNESS0-NEXT: s_mov_b32 s12, s70
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s69			; GLOBALNESS0-NEXT: s_mov_b32 s13, s69
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s68			; GLOBALNESS0-NEXT: s_mov_b32 s14, s68
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS0-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock			; GLOBALNESS0-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock
	bb:			bb:
	store i32 0, ptr addrspace(1) null, align 4			store i32 0, ptr addrspace(1) null, align 4
	%tmp4 = load i32, ptr addrspace(1) %arg1.global, align 4			%tmp4 = load i32, ptr addrspace(1) %arg1.global, align 4
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s
	; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s			; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s

	define hidden void @widget() {			define hidden void @widget() {
	; GCN-LABEL: widget:			; GCN-LABEL: widget:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
	; GCN-NEXT: v_writelane_b32 v42, s16, 0			; GCN-NEXT: v_writelane_b32 v40, s16, 16
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s4, v42, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 16
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	; GCN-NEXT: .LBB0_9: ; %bb2			; GCN-NEXT: .LBB0_9: ; %bb2
	; GCN-NEXT: v_cmp_eq_u32_e64 s[46:47], 21, v0			; GCN-NEXT: v_cmp_eq_u32_e64 s[46:47], 21, v0
	; GCN-NEXT: v_cmp_ne_u32_e64 s[6:7], 21, v0			; GCN-NEXT: v_cmp_ne_u32_e64 s[6:7], 21, v0
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	;			;
	; GCN-LABEL: blam:			; GCN-LABEL: blam:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
	; GCN-NEXT: v_writelane_b32 v46, s16, 0			; GCN-NEXT: v_writelane_b32 v40, s16, 28
	; GCN-NEXT: s_addk_i32 s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s4, v46, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 28
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xf800			; GCN-NEXT: s_addk_i32 s32, 0xf800
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = load float, ptr null, align 16			%tmp = load float, ptr null, align 16
	br label %bb2			br label %bb2
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s

	@V1 = protected local_unnamed_addr addrspace(1) global i32 0, align 4			@V1 = protected local_unnamed_addr addrspace(1) global i32 0, align 4
	@V2 = protected local_unnamed_addr addrspace(1) global i32 0, align 4			@V2 = protected local_unnamed_addr addrspace(1) global i32 0, align 4
	@Q = internal addrspace(3) global i8 poison, align 16			@Q = internal addrspace(3) global i8 poison, align 16

	; Test spill placement of VGPR reload in %bb.194 relative to the SGPR			; Test spill placement of VGPR reload in %bb.194 relative to the SGPR
	; reload used for the exec mask. The buffer_load_dword should be after			; reload used for the exec mask. The buffer_load_dword should be after
	; the s_or_b64 exec.			; the s_or_b64 exec.
	define amdgpu_kernel void @__omp_offloading_16_dd2df_main_l9() {			define amdgpu_kernel void @__omp_offloading_16_dd2df_main_l9() {
	; CHECK-LABEL: __omp_offloading_16_dd2df_main_l9:			; CHECK-LABEL: __omp_offloading_16_dd2df_main_l9:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_add_u32 s0, s0, s15			; CHECK-NEXT: s_add_u32 s0, s0, s15
	; CHECK-NEXT: s_addc_u32 s1, s1, 0			; CHECK-NEXT: s_addc_u32 s1, s1, 0
				; CHECK-NEXT: ; implicit-def: $vgpr1
	; CHECK-NEXT: v_mov_b32_e32 v2, v0			; CHECK-NEXT: v_mov_b32_e32 v2, v0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: s_or_saveexec_b64 s[8:9], -1
	; CHECK-NEXT: global_load_ushort v3, v0, s[4:5] offset:4			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[8:9]
				; CHECK-NEXT: v_mov_b32_e32 v1, 0
				; CHECK-NEXT: global_load_ushort v3, v1, s[4:5] offset:4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], 0 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: ; implicit-def: $sgpr4			; CHECK-NEXT: ; implicit-def: $sgpr4
	; CHECK-NEXT: s_mov_b32 s4, 0			; CHECK-NEXT: s_mov_b32 s4, 0
	; CHECK-NEXT: v_cmp_eq_u32_e64 s[6:7], v2, s4			; CHECK-NEXT: v_cmp_eq_u32_e64 s[6:7], v2, s4
	; CHECK-NEXT: s_mov_b32 s4, 0			; CHECK-NEXT: s_mov_b32 s4, 0
	; CHECK-NEXT: v_mov_b32_e32 v2, s4			; CHECK-NEXT: v_mov_b32_e32 v2, s4
	; CHECK-NEXT: ds_write_b8 v0, v2			; CHECK-NEXT: ds_write_b8 v1, v2
	; CHECK-NEXT: s_mov_b64 s[4:5], exec			; CHECK-NEXT: s_mov_b64 s[4:5], exec
	; CHECK-NEXT: v_writelane_b32 v1, s4, 0			; CHECK-NEXT: v_writelane_b32 v0, s4, 0
	; CHECK-NEXT: v_writelane_b32 v1, s5, 1			; CHECK-NEXT: v_writelane_b32 v0, s5, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[8:9], -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[8:9]
	; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]			; CHECK-NEXT: s_and_b64 s[4:5], s[4:5], s[6:7]
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_cbranch_execz .LBB0_2			; CHECK-NEXT: s_cbranch_execz .LBB0_2
	; CHECK-NEXT: ; %bb.1: ; %bb193			; CHECK-NEXT: ; %bb.1: ; %bb193
	; CHECK-NEXT: .LBB0_2: ; %bb194			; CHECK-NEXT: .LBB0_2: ; %bb194
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 ; 4-byte Folded Reload
				; CHECK-NEXT: s_or_saveexec_b64 s[8:9], -1
				; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[8:9]
				; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s4, v1, 0			; CHECK-NEXT: v_readlane_b32 s4, v1, 0
	; CHECK-NEXT: v_readlane_b32 s5, v1, 1			; CHECK-NEXT: v_readlane_b32 s5, v1, 1
	; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]			; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
	; CHECK-NEXT: s_mov_b32 s4, 0			; CHECK-NEXT: s_mov_b32 s4, 0
	; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_cmp_ne_u16_e64 s[4:5], v0, s4			; CHECK-NEXT: v_cmp_ne_u16_e64 s[4:5], v0, s4
	; CHECK-NEXT: s_and_b64 vcc, exec, s[4:5]			; CHECK-NEXT: s_and_b64 vcc, exec, s[4:5]
	; CHECK-NEXT: s_cbranch_vccnz .LBB0_4			; CHECK-NEXT: s_cbranch_vccnz .LBB0_4
	; CHECK-NEXT: ; %bb.3: ; %bb201			; CHECK-NEXT: ; %bb.3: ; %bb201
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, V2@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, V2@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, V2@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, V2@rel32@hi+12
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: global_store_short v0, v2, s[4:5]			; CHECK-NEXT: global_store_short v0, v1, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_barrier			; CHECK-NEXT: s_barrier
	; CHECK-NEXT: s_trap 2			; CHECK-NEXT: s_trap 2
	; CHECK-NEXT: ; divergent unreachable			; CHECK-NEXT: ; divergent unreachable
	; CHECK-NEXT: .LBB0_4: ; %UnifiedReturnBlock			; CHECK-NEXT: .LBB0_4: ; %UnifiedReturnBlock
				; CHECK-NEXT: s_or_saveexec_b64 s[8:9], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[8:9]
				; CHECK-NEXT: ; kill: killed $vgpr0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	bb:			bb:
	%i10 = tail call i32 @llvm.amdgcn.workitem.id.x()			%i10 = tail call i32 @llvm.amdgcn.workitem.id.x()
	%i13 = tail call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()			%i13 = tail call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
	%i14 = getelementptr i8, ptr addrspace(4) %i13, i64 4			%i14 = getelementptr i8, ptr addrspace(4) %i13, i64 4
	%i15 = load i16, ptr addrspace(4) %i14, align 4			%i15 = load i16, ptr addrspace(4) %i14, align 4
	%i22 = icmp eq i32 %i10, 0			%i22 = icmp eq i32 %i10, 0
	store i8 0, ptr addrspace(3) @Q			store i8 0, ptr addrspace(3) @Q
	Show All 25 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	Show All 9 Lines
	; preserved across the call and should get 8 scratch registers.			; preserved across the call and should get 8 scratch registers.
	; GFX9-LABEL: non_preserved_vgpr_tuple8:			; GFX9-LABEL: non_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: v_mov_b32_e32 v36, v16			; GFX9-NEXT: v_mov_b32_e32 v36, v16
	; GFX9-NEXT: v_mov_b32_e32 v35, v15			; GFX9-NEXT: v_mov_b32_e32 v35, v15
	; GFX9-NEXT: v_mov_b32_e32 v34, v14			; GFX9-NEXT: v_mov_b32_e32 v34, v14
	; GFX9-NEXT: v_mov_b32_e32 v33, v13			; GFX9-NEXT: v_mov_b32_e32 v33, v13
	; GFX9-NEXT: v_mov_b32_e32 v32, v12			; GFX9-NEXT: v_mov_b32_e32 v32, v12
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v45, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 2
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_mov_b32_e32 v0, v41			; GFX9-NEXT: v_mov_b32_e32 v0, v41
	; GFX9-NEXT: v_mov_b32_e32 v1, v42			; GFX9-NEXT: v_mov_b32_e32 v1, v42
	; GFX9-NEXT: v_mov_b32_e32 v2, v43			; GFX9-NEXT: v_mov_b32_e32 v2, v43
	; GFX9-NEXT: v_mov_b32_e32 v3, v44			; GFX9-NEXT: v_mov_b32_e32 v3, v44
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v45, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: non_preserved_vgpr_tuple8:			; GFX10-LABEL: non_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: v_mov_b32_e32 v36, v16			; GFX10-NEXT: v_mov_b32_e32 v36, v16
	; GFX10-NEXT: v_mov_b32_e32 v35, v15			; GFX10-NEXT: v_mov_b32_e32 v35, v15
	; GFX10-NEXT: v_mov_b32_e32 v34, v14			; GFX10-NEXT: v_mov_b32_e32 v34, v14
	; GFX10-NEXT: v_mov_b32_e32 v33, v13			; GFX10-NEXT: v_mov_b32_e32 v33, v13
	; GFX10-NEXT: v_mov_b32_e32 v32, v12			; GFX10-NEXT: v_mov_b32_e32 v32, v12
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_writelane_b32 v45, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 2
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_mov_b32_e32 v0, v41			; GFX10-NEXT: v_mov_b32_e32 v0, v41
	; GFX10-NEXT: v_mov_b32_e32 v1, v42			; GFX10-NEXT: v_mov_b32_e32 v1, v42
	; GFX10-NEXT: v_mov_b32_e32 v2, v43			; GFX10-NEXT: v_mov_b32_e32 v2, v43
	; GFX10-NEXT: v_mov_b32_e32 v3, v44			; GFX10-NEXT: v_mov_b32_e32 v3, v44
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s4, v45, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: non_preserved_vgpr_tuple8:			; GFX11-LABEL: non_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:16 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:16
	; GFX11-NEXT: scratch_store_b32 off, v45, s33 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_dual_mov_b32 v36, v16 :: v_dual_mov_b32 v35, v15			; GFX11-NEXT: v_dual_mov_b32 v36, v16 :: v_dual_mov_b32 v35, v15
	; GFX11-NEXT: v_dual_mov_b32 v34, v14 :: v_dual_mov_b32 v33, v13			; GFX11-NEXT: v_dual_mov_b32 v34, v14 :: v_dual_mov_b32 v33, v13
	; GFX11-NEXT: v_mov_b32_e32 v32, v12			; GFX11-NEXT: v_mov_b32_e32 v32, v12
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8			; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v44, s33			; GFX11-NEXT: scratch_store_b32 off, v44, s33
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX11-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: v_writelane_b32 v45, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_dual_mov_b32 v0, v41 :: v_dual_mov_b32 v1, v42			; GFX11-NEXT: v_dual_mov_b32 v0, v41 :: v_dual_mov_b32 v1, v42
	; GFX11-NEXT: v_dual_mov_b32 v2, v43 :: v_dual_mov_b32 v3, v44			; GFX11-NEXT: v_dual_mov_b32 v2, v43 :: v_dual_mov_b32 v3, v44
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_load_b32 v44, off, s33			; GFX11-NEXT: scratch_load_b32 v44, off, s33
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v45, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16
	; GFX11-NEXT: scratch_load_b32 v45, off, s33 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]



	Show All 21 Lines
	; The upper 3 sub-registers are unused.			; The upper 3 sub-registers are unused.
	; GFX9-LABEL: call_preserved_vgpr_tuple8:			; GFX9-LABEL: call_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_mov_b32_e32 v45, v16			; GFX9-NEXT: v_mov_b32_e32 v45, v16
	; GFX9-NEXT: v_mov_b32_e32 v44, v15			; GFX9-NEXT: v_mov_b32_e32 v44, v15
	; GFX9-NEXT: v_mov_b32_e32 v43, v14			; GFX9-NEXT: v_mov_b32_e32 v43, v14
	; GFX9-NEXT: v_mov_b32_e32 v42, v13			; GFX9-NEXT: v_mov_b32_e32 v42, v13
	; GFX9-NEXT: v_mov_b32_e32 v41, v12			; GFX9-NEXT: v_mov_b32_e32 v41, v12
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[4:11], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v46, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 2
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[4:11], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v46, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_preserved_vgpr_tuple8:			; GFX10-LABEL: call_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_writelane_b32 v46, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 2
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v41, v16			; GFX10-NEXT: v_mov_b32_e32 v41, v16
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v42, v15			; GFX10-NEXT: v_mov_b32_e32 v42, v15
				; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v43, v14			; GFX10-NEXT: v_mov_b32_e32 v43, v14
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_mov_b32_e32 v44, v13			; GFX10-NEXT: v_mov_b32_e32 v44, v13
	; GFX10-NEXT: v_mov_b32_e32 v45, v12			; GFX10-NEXT: v_mov_b32_e32 v45, v12
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_clause 0x4			; GFX10-NEXT: s_clause 0x4
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s4, v46, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20
	; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_preserved_vgpr_tuple8:			; GFX11-LABEL: call_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:20 ; 4-byte Folded Spill
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:20
	; GFX11-NEXT: scratch_store_b32 off, v46, s33 offset:24
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_clause 0x4			; GFX11-NEXT: s_clause 0x4
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:16			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:16
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:12			; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:12
	; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:8			; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v44, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v44, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v45, s33			; GFX11-NEXT: scratch_store_b32 off, v45, s33
	; GFX11-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX11-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: v_writelane_b32 v46, s0, 0			; GFX11-NEXT: v_writelane_b32 v40, s0, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_dual_mov_b32 v41, v16 :: v_dual_mov_b32 v42, v15			; GFX11-NEXT: v_dual_mov_b32 v41, v16 :: v_dual_mov_b32 v42, v15
				; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v43, v14 :: v_dual_mov_b32 v44, v13			; GFX11-NEXT: v_dual_mov_b32 v43, v14 :: v_dual_mov_b32 v44, v13
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: v_mov_b32_e32 v45, v12			; GFX11-NEXT: v_mov_b32_e32 v45, v12
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_store_b128 v[0:1], v[0:3], off			; GFX11-NEXT: global_store_b128 v[0:1], v[0:3], off
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX11-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX11-NEXT: s_clause 0x4			; GFX11-NEXT: s_clause 0x4
	; GFX11-NEXT: scratch_load_b32 v45, off, s33			; GFX11-NEXT: scratch_load_b32 v45, off, s33
	; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:12
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:16			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:16
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v46, 0			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:20 ; 4-byte Folded Reload
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:20
	; GFX11-NEXT: scratch_load_b32 v46, off, s33 offset:24
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]



	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O0 -mcpu=gfx1030 < %s \| FileCheck %s			; RUN: llc -O0 -mcpu=gfx1030 < %s \| FileCheck %s

	target triple = "amdgcn-amd-amdhsa"			target triple = "amdgcn-amd-amdhsa"

	; Unknown functions are conservatively passed all implicit parameters			; Unknown functions are conservatively passed all implicit parameters
	declare void @unknown_call()			declare void @unknown_call()
	; Use the same constant as a sgpr parameter (for the kernel id) and for a vector operation			; Use the same constant as a sgpr parameter (for the kernel id) and for a vector operation
	define protected amdgpu_kernel void @kern(ptr %addr) !llvm.amdgcn.lds.kernel.id !0 {			define protected amdgpu_kernel void @kern(ptr %addr) !llvm.amdgcn.lds.kernel.id !0 {
	; CHECK-LABEL: kern:			; CHECK-LABEL: kern:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_mov_b32 s32, 0			; CHECK-NEXT: s_mov_b32 s32, 0x200
	; CHECK-NEXT: s_add_u32 s12, s12, s17			; CHECK-NEXT: s_add_u32 s12, s12, s17
	; CHECK-NEXT: s_addc_u32 s13, s13, 0			; CHECK-NEXT: s_addc_u32 s13, s13, 0
	; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12			; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12
	; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13			; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13
	; CHECK-NEXT: s_add_u32 s0, s0, s17			; CHECK-NEXT: s_add_u32 s0, s0, s17
	; CHECK-NEXT: s_addc_u32 s1, s1, 0			; CHECK-NEXT: s_addc_u32 s1, s1, 0
	; CHECK-NEXT: v_writelane_b32 v40, s16, 0			; CHECK-NEXT: ; implicit-def: $vgpr3
				; CHECK-NEXT: v_writelane_b32 v3, s16, 0
				; CHECK-NEXT: s_or_saveexec_b32 s33, -1
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b32 exec_lo, s33
	; CHECK-NEXT: s_mov_b32 s13, s15			; CHECK-NEXT: s_mov_b32 s13, s15
	; CHECK-NEXT: s_mov_b32 s12, s14			; CHECK-NEXT: s_mov_b32 s12, s14
	; CHECK-NEXT: v_readlane_b32 s14, v40, 0			; CHECK-NEXT: v_readlane_b32 s14, v3, 0
	; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9]			; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9]
	; CHECK-NEXT: s_load_dwordx2 s[8:9], s[16:17], 0x0			; CHECK-NEXT: s_load_dwordx2 s[8:9], s[16:17], 0x0
	; CHECK-NEXT: v_mov_b32_e32 v5, 42			; CHECK-NEXT: v_mov_b32_e32 v5, 42
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v3, s8			; CHECK-NEXT: v_mov_b32_e32 v3, s8
	; CHECK-NEXT: v_mov_b32_e32 v4, s9			; CHECK-NEXT: v_mov_b32_e32 v4, s9
	; CHECK-NEXT: flat_store_dword v[3:4], v5			; CHECK-NEXT: flat_store_dword v[3:4], v5
	; CHECK-NEXT: s_mov_b64 s[18:19], 8			; CHECK-NEXT: s_mov_b64 s[18:19], 8
	Show All 16 Lines
	; CHECK-NEXT: s_mov_b32 s15, 10			; CHECK-NEXT: s_mov_b32 s15, 10
	; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1			; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1
	; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2			; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2
	; CHECK-NEXT: s_mov_b32 s15, 42			; CHECK-NEXT: s_mov_b32 s15, 42
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; CHECK-NEXT: s_or_saveexec_b32 s33, -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b32 exec_lo, s33
				; CHECK-NEXT: ; kill: killed $vgpr0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	store i32 42, ptr %addr			store i32 42, ptr %addr
	call fastcc void @unknown_call()			call fastcc void @unknown_call()
	ret void			ret void
	}			}

	!0 = !{i32 42}			!0 = !{i32 42}

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 2,852 Lines • ▼ Show 20 Lines
	define void @callee_no_stack_with_call() #1 {			define void @callee_no_stack_with_call() #1 {
	; GFX1032-LABEL: callee_no_stack_with_call:			; GFX1032-LABEL: callee_no_stack_with_call:
	; GFX1032: ; %bb.0:			; GFX1032: ; %bb.0:
	; GFX1032-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_mov_b32 s16, s33			; GFX1032-NEXT: s_mov_b32 s16, s33
	; GFX1032-NEXT: s_mov_b32 s33, s32			; GFX1032-NEXT: s_mov_b32 s33, s32
	; GFX1032-NEXT: s_or_saveexec_b32 s17, -1			; GFX1032-NEXT: s_or_saveexec_b32 s17, -1
	; GFX1032-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX1032-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX1032-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_mov_b32 exec_lo, s17			; GFX1032-NEXT: s_mov_b32 exec_lo, s17
	; GFX1032-NEXT: s_addk_i32 s32, 0x200			; GFX1032-NEXT: s_addk_i32 s32, 0x200
	; GFX1032-NEXT: v_writelane_b32 v41, s16, 0			; GFX1032-NEXT: v_writelane_b32 v40, s16, 2
	; GFX1032-NEXT: s_getpc_b64 s[16:17]			; GFX1032-NEXT: s_getpc_b64 s[16:17]
	; GFX1032-NEXT: s_add_u32 s16, s16, external_void_func_void@gotpcrel32@lo+4			; GFX1032-NEXT: s_add_u32 s16, s16, external_void_func_void@gotpcrel32@lo+4
	; GFX1032-NEXT: s_addc_u32 s17, s17, external_void_func_void@gotpcrel32@hi+12			; GFX1032-NEXT: s_addc_u32 s17, s17, external_void_func_void@gotpcrel32@hi+12
	; GFX1032-NEXT: v_writelane_b32 v40, s30, 0
	; GFX1032-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; GFX1032-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GFX1032-NEXT: v_writelane_b32 v40, s30, 0
	; GFX1032-NEXT: v_writelane_b32 v40, s31, 1			; GFX1032-NEXT: v_writelane_b32 v40, s31, 1
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GFX1032-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GFX1032-NEXT: v_readlane_b32 s31, v40, 1			; GFX1032-NEXT: v_readlane_b32 s31, v40, 1
	; GFX1032-NEXT: v_readlane_b32 s30, v40, 0			; GFX1032-NEXT: v_readlane_b32 s30, v40, 0
	; GFX1032-NEXT: v_readlane_b32 s4, v41, 0			; GFX1032-NEXT: v_readlane_b32 s4, v40, 2
	; GFX1032-NEXT: s_or_saveexec_b32 s5, -1			; GFX1032-NEXT: s_or_saveexec_b32 s5, -1
	; GFX1032-NEXT: s_clause 0x1			; GFX1032-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX1032-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX1032-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_mov_b32 exec_lo, s5			; GFX1032-NEXT: s_mov_b32 exec_lo, s5
	; GFX1032-NEXT: s_addk_i32 s32, 0xfe00			; GFX1032-NEXT: s_addk_i32 s32, 0xfe00
	; GFX1032-NEXT: s_mov_b32 s33, s4			; GFX1032-NEXT: s_mov_b32 s33, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0)
	; GFX1032-NEXT: s_setpc_b64 s[30:31]			; GFX1032-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX1064-LABEL: callee_no_stack_with_call:			; GFX1064-LABEL: callee_no_stack_with_call:
	; GFX1064: ; %bb.0:			; GFX1064: ; %bb.0:
	; GFX1064-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_mov_b32 s16, s33			; GFX1064-NEXT: s_mov_b32 s16, s33
	; GFX1064-NEXT: s_mov_b32 s33, s32			; GFX1064-NEXT: s_mov_b32 s33, s32
	; GFX1064-NEXT: s_or_saveexec_b64 s[18:19], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GFX1064-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX1064-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX1064-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, s[18:19]			; GFX1064-NEXT: s_mov_b64 exec, s[18:19]
	; GFX1064-NEXT: s_addk_i32 s32, 0x400			; GFX1064-NEXT: s_addk_i32 s32, 0x400
	; GFX1064-NEXT: v_writelane_b32 v41, s16, 0			; GFX1064-NEXT: v_writelane_b32 v40, s16, 2
	; GFX1064-NEXT: s_getpc_b64 s[16:17]			; GFX1064-NEXT: s_getpc_b64 s[16:17]
	; GFX1064-NEXT: s_add_u32 s16, s16, external_void_func_void@gotpcrel32@lo+4			; GFX1064-NEXT: s_add_u32 s16, s16, external_void_func_void@gotpcrel32@lo+4
	; GFX1064-NEXT: s_addc_u32 s17, s17, external_void_func_void@gotpcrel32@hi+12			; GFX1064-NEXT: s_addc_u32 s17, s17, external_void_func_void@gotpcrel32@hi+12
	; GFX1064-NEXT: v_writelane_b32 v40, s30, 0
	; GFX1064-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; GFX1064-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GFX1064-NEXT: v_writelane_b32 v40, s30, 0
	; GFX1064-NEXT: v_writelane_b32 v40, s31, 1			; GFX1064-NEXT: v_writelane_b32 v40, s31, 1
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GFX1064-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GFX1064-NEXT: v_readlane_b32 s31, v40, 1			; GFX1064-NEXT: v_readlane_b32 s31, v40, 1
	; GFX1064-NEXT: v_readlane_b32 s30, v40, 0			; GFX1064-NEXT: v_readlane_b32 s30, v40, 0
	; GFX1064-NEXT: v_readlane_b32 s4, v41, 0			; GFX1064-NEXT: v_readlane_b32 s4, v40, 2
	; GFX1064-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX1064-NEXT: s_clause 0x1			; GFX1064-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX1064-NEXT: buffer_load_dword v40, off, s[0:3], s33
	; GFX1064-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, s[6:7]			; GFX1064-NEXT: s_mov_b64 exec, s[6:7]
	; GFX1064-NEXT: s_addk_i32 s32, 0xfc00			; GFX1064-NEXT: s_addk_i32 s32, 0xfc00
	; GFX1064-NEXT: s_mov_b32 s33, s4			; GFX1064-NEXT: s_mov_b32 s33, s4
	; GFX1064-NEXT: s_waitcnt vmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0)
	; GFX1064-NEXT: s_setpc_b64 s[30:31]			; GFX1064-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/whole-wave-register-copy.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX90A %s

				; The test forces a high vector register pressure and there won't be sufficient VGPRs to be allocated
				; for writelane/readlane SGPR spill instructions. Regalloc would split the vector register liverange
				; by introducing a copy to AGPR register. The VGPR store to AGPR (v_accvgpr_write_b32) and later the
				; restore from AGPR (v_accvgpr_read_b32) should be whole-wave operations and hence exec mask should be
				; manipulated to ensure all lanes are active when these instructions are executed.
				define void @vector_reg_liverange_split() #0 {
				; GFX90A-LABEL: vector_reg_liverange_split:
				; GFX90A: ; %bb.0:
				; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX90A-NEXT: s_mov_b32 s16, s33
				; GFX90A-NEXT: s_mov_b32 s33, s32
				; GFX90A-NEXT: s_xor_saveexec_b64 s[18:19], -1
				; GFX90A-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GFX90A-NEXT: s_mov_b64 exec, -1
				; GFX90A-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX90A-NEXT: buffer_store_dword a32, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GFX90A-NEXT: s_mov_b64 exec, s[18:19]
				; GFX90A-NEXT: v_writelane_b32 v40, s16, 2
				; GFX90A-NEXT: ; implicit-def: $vgpr0
				; GFX90A-NEXT: v_writelane_b32 v40, s30, 0
				; GFX90A-NEXT: s_addk_i32 s32, 0x400
				; GFX90A-NEXT: v_writelane_b32 v40, s31, 1
				; GFX90A-NEXT: ;;#ASMSTART
				; GFX90A-NEXT: ; def s20
				; GFX90A-NEXT: ;;#ASMEND
				; GFX90A-NEXT: v_writelane_b32 v0, s20, 0
				; GFX90A-NEXT: s_or_saveexec_b64 s[28:29], -1
				; GFX90A-NEXT: v_accvgpr_write_b32 a32, v0
				; GFX90A-NEXT: s_mov_b64 exec, s[28:29]
				; GFX90A-NEXT: s_getpc_b64 s[16:17]
				; GFX90A-NEXT: s_add_u32 s16, s16, foo@gotpcrel32@lo+4
				; GFX90A-NEXT: s_addc_u32 s17, s17, foo@gotpcrel32@hi+12
				; GFX90A-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GFX90A-NEXT: s_or_saveexec_b64 s[28:29], -1
				; GFX90A-NEXT: v_accvgpr_read_b32 v0, a32
				; GFX90A-NEXT: s_mov_b64 exec, s[28:29]
				; GFX90A-NEXT: v_readlane_b32 s20, v0, 0
				; GFX90A-NEXT: ;;#ASMSTART
				; GFX90A-NEXT: ; use s20
				; GFX90A-NEXT: ;;#ASMEND
				; GFX90A-NEXT: v_readlane_b32 s31, v40, 1
				; GFX90A-NEXT: v_readlane_b32 s30, v40, 0
				; GFX90A-NEXT: ; kill: killed $vgpr0
				; GFX90A-NEXT: v_readlane_b32 s4, v40, 2
				; GFX90A-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; GFX90A-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX90A-NEXT: s_mov_b64 exec, -1
				; GFX90A-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
				; GFX90A-NEXT: buffer_load_dword a32, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX90A-NEXT: s_mov_b64 exec, s[6:7]
				; GFX90A-NEXT: s_addk_i32 s32, 0xfc00
				; GFX90A-NEXT: s_mov_b32 s33, s4
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: s_setpc_b64 s[30:31]
				%s20 = call i32 asm sideeffect "; def $0","=${s20}"()
				call void @foo()
				call void asm sideeffect "; use $0","${s20}"(i32 %s20)
				ret void
				}

				declare void @foo()

				attributes #0 = { "amdgpu-num-vgpr"="41" "amdgpu-num-sgpr"="34"}

llvm/test/CodeGen/AMDGPU/whole-wave-register-spill.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 --verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 --verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN-O0 %s

				; Test whole-wave register spilling.

				; In this testcase, the return address registers, PC value (SGPR30_SGPR31) and the scratch SGPR used in
				; the inline asm statements should be preserved across the call. Since the test limits the VGPR numbers,
				; the PC will be spilled to the only available CSR VGPR (VGPR40) as we spill CSR SGPRs including the PC
				; directly to the physical VGPR lane to correctly generate the CFIs. The SGPR20 will get spilled to the
				; virtual VGPR lane and that would be allocated by regalloc. Since there is no free VGPR to allocate, RA
				; must spill a scratch VGPR. The writelane/readlane instructions that spill/restore SGPRs into/from VGPR
				; are whole-wave operations and hence the VGPRs involved in such operations require whole-wave spilling.

				define void @test() #0 {
				; GCN-LABEL: test:
				; GCN: ; %bb.0:
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: s_mov_b32 s16, s33
				; GCN-NEXT: s_mov_b32 s33, s32
				; GCN-NEXT: s_xor_saveexec_b64 s[18:19], -1
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, -1
				; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: v_writelane_b32 v40, s28, 2
				; GCN-NEXT: v_writelane_b32 v40, s29, 3
				; GCN-NEXT: v_writelane_b32 v40, s16, 4
				; GCN-NEXT: ; implicit-def: $vgpr0
				; GCN-NEXT: v_writelane_b32 v40, s30, 0
				; GCN-NEXT: s_addk_i32 s32, 0x800
				; GCN-NEXT: v_writelane_b32 v40, s31, 1
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; def s16
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_writelane_b32 v0, s16, 0
				; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, s[28:29]
				; GCN-NEXT: s_getpc_b64 s[16:17]
				; GCN-NEXT: s_add_u32 s16, s16, ext_func@gotpcrel32@lo+4
				; GCN-NEXT: s_addc_u32 s17, s17, ext_func@gotpcrel32@hi+12
				; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
				; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[28:29]
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_readlane_b32 s4, v1, 0
				; GCN-NEXT: v_mov_b32_e32 v0, s4
				; GCN-NEXT: global_store_dword v[0:1], v0, off
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_readlane_b32 s31, v40, 1
				; GCN-NEXT: v_readlane_b32 s30, v40, 0
				; GCN-NEXT: ; kill: killed $vgpr1
				; GCN-NEXT: v_readlane_b32 s28, v40, 2
				; GCN-NEXT: v_readlane_b32 s29, v40, 3
				; GCN-NEXT: v_readlane_b32 s4, v40, 4
				; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, -1
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-NEXT: s_addk_i32 s32, 0xf800
				; GCN-NEXT: s_mov_b32 s33, s4
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
				;
				; GCN-O0-LABEL: test:
				; GCN-O0: ; %bb.0:
				; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-O0-NEXT: s_mov_b32 s16, s33
				; GCN-O0-NEXT: s_mov_b32 s33, s32
				; GCN-O0-NEXT: s_xor_saveexec_b64 s[18:19], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, -1
				; GCN-O0-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-O0-NEXT: v_writelane_b32 v40, s28, 2
				; GCN-O0-NEXT: v_writelane_b32 v40, s29, 3
				; GCN-O0-NEXT: v_writelane_b32 v40, s16, 4
				; GCN-O0-NEXT: s_add_i32 s32, s32, 0x400
				; GCN-O0-NEXT: ; implicit-def: $vgpr0
				; GCN-O0-NEXT: v_writelane_b32 v40, s30, 0
				; GCN-O0-NEXT: v_writelane_b32 v40, s31, 1
				; GCN-O0-NEXT: ;;#ASMSTART
				; GCN-O0-NEXT: ; def s16
				; GCN-O0-NEXT: ;;#ASMEND
				; GCN-O0-NEXT: v_writelane_b32 v0, s16, 0
				; GCN-O0-NEXT: s_or_saveexec_b64 s[28:29], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[28:29]
				; GCN-O0-NEXT: s_getpc_b64 s[16:17]
				; GCN-O0-NEXT: s_add_u32 s16, s16, ext_func@gotpcrel32@lo+4
				; GCN-O0-NEXT: s_addc_u32 s17, s17, ext_func@gotpcrel32@hi+12
				; GCN-O0-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GCN-O0-NEXT: s_mov_b64 s[22:23], s[2:3]
				; GCN-O0-NEXT: s_mov_b64 s[20:21], s[0:1]
				; GCN-O0-NEXT: s_mov_b64 s[0:1], s[20:21]
				; GCN-O0-NEXT: s_mov_b64 s[2:3], s[22:23]
				; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-O0-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GCN-O0-NEXT: s_or_saveexec_b64 s[28:29], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[28:29]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s4, v0, 0
				; GCN-O0-NEXT: ; implicit-def: $sgpr6_sgpr7
				; GCN-O0-NEXT: v_mov_b32_e32 v1, s6
				; GCN-O0-NEXT: v_mov_b32_e32 v2, s7
				; GCN-O0-NEXT: v_mov_b32_e32 v3, s4
				; GCN-O0-NEXT: global_store_dword v[1:2], v3, off
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s31, v40, 1
				; GCN-O0-NEXT: v_readlane_b32 s30, v40, 0
				; GCN-O0-NEXT: ; kill: killed $vgpr0
				; GCN-O0-NEXT: v_readlane_b32 s28, v40, 2
				; GCN-O0-NEXT: v_readlane_b32 s29, v40, 3
				; GCN-O0-NEXT: v_readlane_b32 s4, v40, 4
				; GCN-O0-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, -1
				; GCN-O0-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00
				; GCN-O0-NEXT: s_mov_b32 s33, s4
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: s_setpc_b64 s[30:31]
				%sgpr = call i32 asm sideeffect "; def $0", "=s" () #0
				call void @ext_func()
				store volatile i32 %sgpr, ptr addrspace(1) undef
				ret void
				}

				declare void @ext_func();

				attributes #0 = { nounwind "amdgpu-num-vgpr"="41" "amdgpu-num-sgpr"="34"}

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
ret void		ret void
}		}

define amdgpu_gfx void @strict_wwm_cfg(ptr addrspace(8) inreg %tmp14, i32 %arg) {		define amdgpu_gfx void @strict_wwm_cfg(ptr addrspace(8) inreg %tmp14, i32 %arg) {
; GFX9-O0-LABEL: strict_wwm_cfg:		; GFX9-O0-LABEL: strict_wwm_cfg:
; GFX9-O0: ; %bb.0: ; %entry		; GFX9-O0: ; %bb.0: ; %entry
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-O0-NEXT: ; implicit-def: $vgpr3
		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v0
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[46:47], -1
		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: s_mov_b64 exec, s[46:47]
; GFX9-O0-NEXT: s_mov_b32 s40, s6		; GFX9-O0-NEXT: s_mov_b32 s40, s6
; GFX9-O0-NEXT: s_mov_b32 s34, s4		; GFX9-O0-NEXT: s_mov_b32 s34, s4
; GFX9-O0-NEXT: ; kill: def $sgpr40 killed $sgpr40 def $sgpr40_sgpr41		; GFX9-O0-NEXT: ; kill: def $sgpr40 killed $sgpr40 def $sgpr40_sgpr41
; GFX9-O0-NEXT: s_mov_b32 s41, s7		; GFX9-O0-NEXT: s_mov_b32 s41, s7
; GFX9-O0-NEXT: s_mov_b32 s42, s41		; GFX9-O0-NEXT: s_mov_b32 s42, s41
; GFX9-O0-NEXT: s_mov_b32 s43, s40		; GFX9-O0-NEXT: s_mov_b32 s43, s40
; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35		; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35
; GFX9-O0-NEXT: s_mov_b32 s35, s5		; GFX9-O0-NEXT: s_mov_b32 s35, s5
; GFX9-O0-NEXT: s_mov_b32 s44, s35		; GFX9-O0-NEXT: s_mov_b32 s44, s35
; GFX9-O0-NEXT: s_mov_b32 s36, s34		; GFX9-O0-NEXT: s_mov_b32 s36, s34
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s44		; GFX9-O0-NEXT: s_mov_b32 s37, s44
; GFX9-O0-NEXT: s_mov_b32 s38, s43		; GFX9-O0-NEXT: s_mov_b32 s38, s43
; GFX9-O0-NEXT: s_mov_b32 s39, s42		; GFX9-O0-NEXT: s_mov_b32 s39, s42
; GFX9-O0-NEXT: v_writelane_b32 v3, s40, 0		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_writelane_b32 v3, s41, 1		; GFX9-O0-NEXT: v_writelane_b32 v0, s40, 0
; GFX9-O0-NEXT: v_writelane_b32 v3, s34, 2		; GFX9-O0-NEXT: v_writelane_b32 v0, s41, 1
; GFX9-O0-NEXT: v_writelane_b32 v3, s35, 3		; GFX9-O0-NEXT: v_writelane_b32 v0, s34, 2
		; GFX9-O0-NEXT: v_writelane_b32 v0, s35, 3
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: s_nop 2		; GFX9-O0-NEXT: s_nop 2
; GFX9-O0-NEXT: buffer_load_dwordx2 v[4:5], off, s[36:39], s34		; GFX9-O0-NEXT: buffer_load_dwordx2 v[4:5], off, s[36:39], s34
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37		; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v4		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v4
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s34
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34
; GFX9-O0-NEXT: s_nop 1		; GFX9-O0-NEXT: s_nop 1
; GFX9-O0-NEXT: v_mov_b32_dpp v2, v1 row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9-O0-NEXT: v_mov_b32_dpp v2, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2		; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2
; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-O0-NEXT: v_mov_b32_e32 v4, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v4, v1
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v0, s34		; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v3, s34
; GFX9-O0-NEXT: v_mov_b32_e32 v0, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v3, s34
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 s[34:35], exec		; GFX9-O0-NEXT: s_mov_b64 s[34:35], exec
; GFX9-O0-NEXT: v_writelane_b32 v3, s34, 4		; GFX9-O0-NEXT: v_writelane_b32 v0, s34, 4
; GFX9-O0-NEXT: v_writelane_b32 v3, s35, 5		; GFX9-O0-NEXT: v_writelane_b32 v0, s35, 5
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[46:47], -1
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: s_mov_b64 exec, s[46:47]
; GFX9-O0-NEXT: s_and_b64 s[34:35], s[34:35], s[36:37]		; GFX9-O0-NEXT: s_and_b64 s[34:35], s[34:35], s[36:37]
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_cbranch_execz .LBB1_2		; GFX9-O0-NEXT: s_cbranch_execz .LBB1_2
; GFX9-O0-NEXT: ; %bb.1: ; %if		; GFX9-O0-NEXT: ; %bb.1: ; %if
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v5		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v4
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_mov_b32_e32 v1, 0		; GFX9-O0-NEXT: v_mov_b32_e32 v1, 0
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v1
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_mov_b32_dpp v1, v2 row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9-O0-NEXT: v_mov_b32_dpp v1, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9-O0-NEXT: v_add_u32_e64 v1, v2, v1		; GFX9-O0-NEXT: v_add_u32_e64 v1, v2, v1
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: .LBB1_2: ; %merge		; GFX9-O0-NEXT: .LBB1_2: ; %merge
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-O0-NEXT: s_or_saveexec_b64 s[46:47], -1
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: s_mov_b64 exec, s[46:47]
; GFX9-O0-NEXT: v_readlane_b32 s36, v3, 4		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s37, v3, 5		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: s_waitcnt vmcnt(2)
		; GFX9-O0-NEXT: v_readlane_b32 s36, v0, 4
		; GFX9-O0-NEXT: v_readlane_b32 s37, v0, 5
; GFX9-O0-NEXT: s_or_b64 exec, exec, s[36:37]		; GFX9-O0-NEXT: s_or_b64 exec, exec, s[36:37]
; GFX9-O0-NEXT: v_readlane_b32 s38, v3, 0		; GFX9-O0-NEXT: v_readlane_b32 s38, v0, 0
; GFX9-O0-NEXT: v_readlane_b32 s39, v3, 1		; GFX9-O0-NEXT: v_readlane_b32 s39, v0, 1
; GFX9-O0-NEXT: v_readlane_b32 s34, v3, 2		; GFX9-O0-NEXT: v_readlane_b32 s34, v0, 2
; GFX9-O0-NEXT: v_readlane_b32 s35, v3, 3		; GFX9-O0-NEXT: v_readlane_b32 s35, v0, 3
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v0, v4		; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v3, v4
; GFX9-O0-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[36:37]		; GFX9-O0-NEXT: v_cndmask_b32_e64 v3, 0, 1, s[36:37]
; GFX9-O0-NEXT: s_mov_b32 s36, 1		; GFX9-O0-NEXT: s_mov_b32 s36, 1
; GFX9-O0-NEXT: v_lshlrev_b32_e64 v0, s36, v0		; GFX9-O0-NEXT: v_lshlrev_b32_e64 v3, s36, v3
; GFX9-O0-NEXT: s_mov_b32 s36, 2		; GFX9-O0-NEXT: s_mov_b32 s36, 2
; GFX9-O0-NEXT: v_and_b32_e64 v0, v0, s36		; GFX9-O0-NEXT: v_and_b32_e64 v3, v3, s36
; GFX9-O0-NEXT: s_mov_b32 s40, s35		; GFX9-O0-NEXT: s_mov_b32 s40, s35
; GFX9-O0-NEXT: s_mov_b32 s36, s34		; GFX9-O0-NEXT: s_mov_b32 s36, s34
; GFX9-O0-NEXT: s_mov_b32 s34, s39		; GFX9-O0-NEXT: s_mov_b32 s34, s39
; GFX9-O0-NEXT: s_mov_b32 s35, s38		; GFX9-O0-NEXT: s_mov_b32 s35, s38
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s40		; GFX9-O0-NEXT: s_mov_b32 s37, s40
; GFX9-O0-NEXT: s_mov_b32 s38, s35		; GFX9-O0-NEXT: s_mov_b32 s38, s35
; GFX9-O0-NEXT: s_mov_b32 s39, s34		; GFX9-O0-NEXT: s_mov_b32 s39, s34
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[36:39], s34 offset:4
		; GFX9-O0-NEXT: ; kill: killed $vgpr0
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_cfg:		; GFX9-O3-LABEL: strict_wwm_cfg:
; GFX9-O3: ; %bb.0: ; %entry		; GFX9-O3: ; %bb.0: ; %entry
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
%sub = sub i64 %mul, %add		%sub = sub i64 %mul, %add
ret i64 %sub		ret i64 %sub
}		}

define amdgpu_gfx void @strict_wwm_call_i64(ptr addrspace(8) inreg %tmp14, i64 inreg %arg) {		define amdgpu_gfx void @strict_wwm_call_i64(ptr addrspace(8) inreg %tmp14, i64 inreg %arg) {
; GFX9-O0-LABEL: strict_wwm_call_i64:		; GFX9-O0-LABEL: strict_wwm_call_i64:
; GFX9-O0: ; %bb.0:		; GFX9-O0: ; %bb.0:
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_mov_b32 s46, s33		; GFX9-O0-NEXT: s_mov_b32 s48, s33
; GFX9-O0-NEXT: s_mov_b32 s33, s32		; GFX9-O0-NEXT: s_mov_b32 s33, s32
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_store_dword v10, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v10, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xc00		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x1000
		; GFX9-O0-NEXT: ; implicit-def: $vgpr0
; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0		; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0
; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1		; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1
; GFX9-O0-NEXT: s_mov_b32 s34, s8		; GFX9-O0-NEXT: s_mov_b32 s34, s8
; GFX9-O0-NEXT: s_mov_b32 s38, s6		; GFX9-O0-NEXT: s_mov_b32 s38, s6
; GFX9-O0-NEXT: s_mov_b32 s36, s4		; GFX9-O0-NEXT: s_mov_b32 s36, s4
; GFX9-O0-NEXT: ; kill: def $sgpr38 killed $sgpr38 def $sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr38 killed $sgpr38 def $sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s39, s7		; GFX9-O0-NEXT: s_mov_b32 s39, s7
; GFX9-O0-NEXT: s_mov_b32 s35, s39		; GFX9-O0-NEXT: s_mov_b32 s35, s39
; GFX9-O0-NEXT: s_mov_b32 s44, s38		; GFX9-O0-NEXT: s_mov_b32 s44, s38
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37
; GFX9-O0-NEXT: s_mov_b32 s37, s5		; GFX9-O0-NEXT: s_mov_b32 s37, s5
; GFX9-O0-NEXT: s_mov_b32 s45, s37		; GFX9-O0-NEXT: s_mov_b32 s45, s37
; GFX9-O0-NEXT: s_mov_b32 s40, s36		; GFX9-O0-NEXT: s_mov_b32 s40, s36
; GFX9-O0-NEXT: ; kill: def $sgpr40 killed $sgpr40 def $sgpr40_sgpr41_sgpr42_sgpr43		; GFX9-O0-NEXT: ; kill: def $sgpr40 killed $sgpr40 def $sgpr40_sgpr41_sgpr42_sgpr43
; GFX9-O0-NEXT: s_mov_b32 s41, s45		; GFX9-O0-NEXT: s_mov_b32 s41, s45
; GFX9-O0-NEXT: s_mov_b32 s42, s44		; GFX9-O0-NEXT: s_mov_b32 s42, s44
; GFX9-O0-NEXT: s_mov_b32 s43, s35		; GFX9-O0-NEXT: s_mov_b32 s43, s35
; GFX9-O0-NEXT: v_writelane_b32 v10, s40, 2		; GFX9-O0-NEXT: v_writelane_b32 v0, s40, 0
; GFX9-O0-NEXT: v_writelane_b32 v10, s41, 3		; GFX9-O0-NEXT: v_writelane_b32 v0, s41, 1
; GFX9-O0-NEXT: v_writelane_b32 v10, s42, 4		; GFX9-O0-NEXT: v_writelane_b32 v0, s42, 2
; GFX9-O0-NEXT: v_writelane_b32 v10, s43, 5		; GFX9-O0-NEXT: v_writelane_b32 v0, s43, 3
; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35		; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35
; GFX9-O0-NEXT: s_mov_b32 s35, s9		; GFX9-O0-NEXT: s_mov_b32 s35, s9
; GFX9-O0-NEXT: ; kill: def $sgpr36_sgpr37 killed $sgpr34_sgpr35		; GFX9-O0-NEXT: ; kill: def $sgpr36_sgpr37 killed $sgpr34_sgpr35
; GFX9-O0-NEXT: s_mov_b64 s[36:37], 0		; GFX9-O0-NEXT: s_mov_b64 s[36:37], 0
; GFX9-O0-NEXT: v_mov_b32_e32 v8, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v8, s34
; GFX9-O0-NEXT: v_mov_b32_e32 v9, s35		; GFX9-O0-NEXT: v_mov_b32_e32 v9, s35
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v8, s36		; GFX9-O0-NEXT: v_mov_b32_e32 v8, s36
; GFX9-O0-NEXT: v_mov_b32_e32 v9, s37		; GFX9-O0-NEXT: v_mov_b32_e32 v9, s37
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_writelane_b32 v10, s34, 6		; GFX9-O0-NEXT: v_writelane_b32 v0, s34, 4
; GFX9-O0-NEXT: v_writelane_b32 v10, s35, 7		; GFX9-O0-NEXT: v_writelane_b32 v0, s35, 5
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[46:47], -1
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: s_mov_b64 exec, s[46:47]
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v8		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v8
; GFX9-O0-NEXT: s_mov_b32 s34, 32		; GFX9-O0-NEXT: s_mov_b32 s34, 32
; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37		; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37
; GFX9-O0-NEXT: v_lshrrev_b64 v[3:4], s34, v[8:9]		; GFX9-O0-NEXT: v_lshrrev_b64 v[3:4], s34, v[8:9]
; GFX9-O0-NEXT: s_getpc_b64 s[34:35]		; GFX9-O0-NEXT: s_getpc_b64 s[34:35]
; GFX9-O0-NEXT: s_add_u32 s34, s34, strict_wwm_called_i64@gotpcrel32@lo+4		; GFX9-O0-NEXT: s_add_u32 s34, s34, strict_wwm_called_i64@gotpcrel32@lo+4
; GFX9-O0-NEXT: s_addc_u32 s35, s35, strict_wwm_called_i64@gotpcrel32@hi+12		; GFX9-O0-NEXT: s_addc_u32 s35, s35, strict_wwm_called_i64@gotpcrel32@hi+12
; GFX9-O0-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-O0-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-O0-NEXT: s_mov_b64 s[38:39], s[2:3]		; GFX9-O0-NEXT: s_mov_b64 s[38:39], s[2:3]
; GFX9-O0-NEXT: s_mov_b64 s[36:37], s[0:1]		; GFX9-O0-NEXT: s_mov_b64 s[36:37], s[0:1]
; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[36:37]		; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[36:37]
; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[38:39]		; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[38:39]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
; GFX9-O0-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-O0-NEXT: v_readlane_b32 s34, v10, 6		; GFX9-O0-NEXT: s_or_saveexec_b64 s[46:47], -1
; GFX9-O0-NEXT: v_readlane_b32 s35, v10, 7		; GFX9-O0-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s36, v10, 2		; GFX9-O0-NEXT: s_mov_b64 exec, s[46:47]
; GFX9-O0-NEXT: v_readlane_b32 s37, v10, 3		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_readlane_b32 s38, v10, 4		; GFX9-O0-NEXT: v_readlane_b32 s34, v6, 4
; GFX9-O0-NEXT: v_readlane_b32 s39, v10, 5		; GFX9-O0-NEXT: v_readlane_b32 s35, v6, 5
		; GFX9-O0-NEXT: v_readlane_b32 s36, v6, 0
		; GFX9-O0-NEXT: v_readlane_b32 s37, v6, 1
		; GFX9-O0-NEXT: v_readlane_b32 s38, v6, 2
		; GFX9-O0-NEXT: v_readlane_b32 s39, v6, 3
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[46:47], -1
		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: s_mov_b64 exec, s[46:47]
; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1
; GFX9-O0-NEXT: ; implicit-def: $sgpr40		; GFX9-O0-NEXT: ; implicit-def: $sgpr40
; GFX9-O0-NEXT: ; implicit-def: $sgpr40		; GFX9-O0-NEXT: ; implicit-def: $sgpr40
; GFX9-O0-NEXT: v_mov_b32_e32 v4, v8		; GFX9-O0-NEXT: v_mov_b32_e32 v4, v8
; GFX9-O0-NEXT: v_mov_b32_e32 v5, v9		; GFX9-O0-NEXT: v_mov_b32_e32 v5, v9
; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[40:41], v2, v4		; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[40:41], v2, v4
; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[40:41], v3, v5, s[40:41]		; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[40:41], v3, v5, s[40:41]
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v6, v2
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3		; GFX9-O0-NEXT: v_mov_b32_e32 v7, v3
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dwordx2 v[6:7], off, s[36:39], s34 offset:4
; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1		; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1
; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0		; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0
		; GFX9-O0-NEXT: ; kill: killed $vgpr0
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff400		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff000
; GFX9-O0-NEXT: s_mov_b32 s33, s46		; GFX9-O0-NEXT: s_mov_b32 s33, s48
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_call_i64:		; GFX9-O3-LABEL: strict_wwm_call_i64:
; GFX9-O3: ; %bb.0:		; GFX9-O3: ; %bb.0:
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_mov_b32 s40, s33		; GFX9-O3-NEXT: s_mov_b32 s40, s33
; GFX9-O3-NEXT: s_mov_b32 s33, s32		; GFX9-O3-NEXT: s_mov_b32 s33, s32
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
; GFX9-O0-NEXT: buffer_store_dword v37, off, s[0:3], s32 offset:224 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v37, off, s[0:3], s32 offset:224 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v38, off, s[0:3], s32 offset:228 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v38, off, s[0:3], s32 offset:228 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v39, off, s[0:3], s32 offset:232 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v39, off, s[0:3], s32 offset:232 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:236 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:236 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:240 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:240 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, -1		; GFX9-O0-NEXT: s_mov_b64 exec, -1
; GFX9-O0-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:200 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_writelane_b32 v42, s64, 0		; GFX9-O0-NEXT: v_writelane_b32 v42, s64, 0
; GFX9-O0-NEXT: v_writelane_b32 v42, s65, 1		; GFX9-O0-NEXT: v_writelane_b32 v42, s65, 1
; GFX9-O0-NEXT: v_writelane_b32 v42, s66, 2		; GFX9-O0-NEXT: v_writelane_b32 v42, s66, 2
; GFX9-O0-NEXT: v_writelane_b32 v42, s67, 3		; GFX9-O0-NEXT: v_writelane_b32 v42, s67, 3
; GFX9-O0-NEXT: buffer_store_dword v10, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v10, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:84 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:88 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:92 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:96 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:84 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:88 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:92 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:56 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:116 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:120 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:112 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:116 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:12		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:12
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:108 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:112 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:104 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:108 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:100 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:104 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:96 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:100 ; 4-byte Folded Spill
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: v_mov_b32_e32 v0, s4		; GFX9-O0-NEXT: v_mov_b32_e32 v0, s4
; GFX9-O0-NEXT: v_mov_b32_e32 v43, s5		; GFX9-O0-NEXT: v_mov_b32_e32 v43, s5
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s6		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s6
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:196 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:200 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s7		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s7
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:192 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:196 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s8		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s8
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:188 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:192 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s9		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s9
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:184 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:188 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s10		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s10
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:180 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:184 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s11		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s11
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:176 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:180 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s12		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s12
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:172 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:176 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s13		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s13
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:168 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:172 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s14		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s14
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:164 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:168 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s15		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s15
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:160 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:164 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s16		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s16
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:156 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:160 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s17		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s17
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:152 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:156 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s18		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s18
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:148 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:152 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s19		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s19
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:144 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:148 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s20		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s20
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:140 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:144 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s21		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s21
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:136 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:140 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s22		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s22
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:132 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:136 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s23		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s23
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:128 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:132 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s24		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s24
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:124 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:128 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v47, s25		; GFX9-O0-NEXT: v_mov_b32_e32 v47, s25
; GFX9-O0-NEXT: v_mov_b32_e32 v46, s26		; GFX9-O0-NEXT: v_mov_b32_e32 v46, s26
; GFX9-O0-NEXT: v_mov_b32_e32 v45, s27		; GFX9-O0-NEXT: v_mov_b32_e32 v45, s27
; GFX9-O0-NEXT: v_mov_b32_e32 v44, s28		; GFX9-O0-NEXT: v_mov_b32_e32 v44, s28
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s29		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s29
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:120 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:124 ; 4-byte Folded Spill
; GFX9-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31 killed $exec		; GFX9-O0-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31 killed $exec
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:196 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:200 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:192 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:196 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v3, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:188 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:192 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v4, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v4, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:184 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:188 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v5, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v5, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:180 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:184 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v6, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v6, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:176 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:180 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v7, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v7, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:172 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:176 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v8, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v8, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:168 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:172 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v9, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v9, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:164 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:168 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v10, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v10, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:160 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:164 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v11, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v11, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:156 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:160 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v12, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v12, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:152 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:156 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v13, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v13, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:148 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:152 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v14, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v14, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:144 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:148 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v15, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v15, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:140 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:144 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v16, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v16, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:136 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:140 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v17, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v17, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:132 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:136 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v18, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v18, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:128 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:132 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v19, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v19, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:124 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:128 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v20, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v20, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:120 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:124 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_mov_b32_e32 v21, v47		; GFX9-O0-NEXT: v_mov_b32_e32 v21, v47
; GFX9-O0-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:100 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:104 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_mov_b32_e32 v22, v46		; GFX9-O0-NEXT: v_mov_b32_e32 v22, v46
; GFX9-O0-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:104 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:108 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_mov_b32_e32 v23, v45		; GFX9-O0-NEXT: v_mov_b32_e32 v23, v45
; GFX9-O0-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:108 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:112 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_mov_b32_e32 v24, v44		; GFX9-O0-NEXT: v_mov_b32_e32 v24, v44
; GFX9-O0-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:112 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:116 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(4)		; GFX9-O0-NEXT: s_waitcnt vmcnt(4)
; GFX9-O0-NEXT: v_mov_b32_e32 v25, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v25, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:96 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:100 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v26, v43		; GFX9-O0-NEXT: v_mov_b32_e32 v26, v43
; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:116 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:120 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_mov_b32_e32 v27, v47		; GFX9-O0-NEXT: v_mov_b32_e32 v27, v47
; GFX9-O0-NEXT: v_mov_b32_e32 v28, v46		; GFX9-O0-NEXT: v_mov_b32_e32 v28, v46
; GFX9-O0-NEXT: v_mov_b32_e32 v29, v45		; GFX9-O0-NEXT: v_mov_b32_e32 v29, v45
; GFX9-O0-NEXT: v_mov_b32_e32 v30, v44		; GFX9-O0-NEXT: v_mov_b32_e32 v30, v44
; GFX9-O0-NEXT: ; kill: def $vgpr31 killed $vgpr43 killed $exec		; GFX9-O0-NEXT: ; kill: def $vgpr31 killed $vgpr43 killed $exec
; GFX9-O0-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:116 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:120 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v30, off, s[0:3], s32 offset:112 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v30, off, s[0:3], s32 offset:116 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v29, off, s[0:3], s32 offset:108 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v29, off, s[0:3], s32 offset:112 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v28, off, s[0:3], s32 offset:104 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v28, off, s[0:3], s32 offset:108 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v27, off, s[0:3], s32 offset:100 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v27, off, s[0:3], s32 offset:104 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v26, off, s[0:3], s32 offset:96 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v26, off, s[0:3], s32 offset:100 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:92 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:96 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:88 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:92 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:84 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:88 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:84 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:68 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:64 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:60 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:56 ; 4-byte Folded Reload
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec		; GFX9-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec
; GFX9-O0-NEXT: s_waitcnt vmcnt(4)		; GFX9-O0-NEXT: s_waitcnt vmcnt(4)
; GFX9-O0-NEXT: v_mov_b32_e32 v3, v5		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v5
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; implicit-def: $sgpr34		; GFX9-O0-NEXT: ; implicit-def: $sgpr34
; GFX9-O0-NEXT: ; kill: def $vgpr4 killed $vgpr4 def $vgpr4_vgpr5 killed $exec		; GFX9-O0-NEXT: ; kill: def $vgpr4 killed $vgpr4 def $vgpr4_vgpr5 killed $exec
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
; GFX9-O0-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:216 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:216 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:220 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:220 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:224 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:224 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:228 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:228 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v39, off, s[0:3], s32 offset:232 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v39, off, s[0:3], s32 offset:232 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:236 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:236 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:240 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:240 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, -1		; GFX9-O0-NEXT: s_mov_b64 exec, -1
; GFX9-O0-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:200 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_callee_saves:		; GFX9-O3-LABEL: strict_wwm_callee_saves:
; GFX9-O3: ; %bb.0:		; GFX9-O3: ; %bb.0:
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved.ll

	Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	; GFX9: s_swappc_b64			; GFX9: s_swappc_b64
	%tmp134 = call i32 @called(i32 %tmp107)			%tmp134 = call i32 @called(i32 %tmp107)
	; GFX9-O3: v_mov_b32_e32 v1, v0			; GFX9-O3: v_mov_b32_e32 v1, v0
	; GFX9-O3: v_add_u32_e32 v1, v1, v2			; GFX9-O3: v_add_u32_e32 v1, v1, v2
	; GFX9-O0: v_mov_b32_e32 v3, v0			; GFX9-O0: v_mov_b32_e32 v3, v0
	; GFX9-O0: v_add_u32_e64 v3, v3, v6			; GFX9-O0: v_add_u32_e64 v3, v3, v6
	%tmp136 = add i32 %tmp134, %tmp107			%tmp136 = add i32 %tmp134, %tmp107
	%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)			%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)
	; GFX9: buffer_store_dword v0			; GFX9-O0: buffer_store_dword v1
				; GFX9-O3: buffer_store_dword v0
	call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %tmp137, ptr addrspace(8) %tmp14, i32 4, i32 0, i32 0)			call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %tmp137, ptr addrspace(8) %tmp14, i32 4, i32 0, i32 0)
	ret void			ret void
	}			}

	; GFX9-LABEL: {{^}}called_i64:			; GFX9-LABEL: {{^}}called_i64:
	define i64 @called_i64(i64 %a) noinline {			define i64 @called_i64(i64 %a) noinline {
	%add = add i64 %a, %a			%add = add i64 %a, %a
	%mul = mul i64 %add, %a			%mul = mul i64 %add, %a
	▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines
	; GFX9: s_swappc_b64			; GFX9: s_swappc_b64
	%tmp134 = call i32 @strict_wwm_called(i32 %tmp107)			%tmp134 = call i32 @strict_wwm_called(i32 %tmp107)
	; GFX9-O3: v_mov_b32_e32 v1, v0			; GFX9-O3: v_mov_b32_e32 v1, v0
	; GFX9-O3: v_add_u32_e32 v1, v1, v2			; GFX9-O3: v_add_u32_e32 v1, v1, v2
	; GFX9-O0: v_mov_b32_e32 v3, v0			; GFX9-O0: v_mov_b32_e32 v3, v0
	; GFX9-O0: v_add_u32_e64 v3, v3, v6			; GFX9-O0: v_add_u32_e64 v3, v3, v6
	%tmp136 = add i32 %tmp134, %tmp107			%tmp136 = add i32 %tmp134, %tmp107
	%tmp137 = tail call i32 @llvm.amdgcn.strict.wwm.i32(i32 %tmp136)			%tmp137 = tail call i32 @llvm.amdgcn.strict.wwm.i32(i32 %tmp136)
	; GFX9: buffer_store_dword v0			; GFX9-O0: buffer_store_dword v1
				; GFX9-O3: buffer_store_dword v0
	call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %tmp137, ptr addrspace(8) %tmp14, i32 4, i32 0, i32 0)			call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %tmp137, ptr addrspace(8) %tmp14, i32 4, i32 0, i32 0)
	ret void			ret void
	}			}

	; GFX9-LABEL: {{^}}strict_wwm_called_i64:			; GFX9-LABEL: {{^}}strict_wwm_called_i64:
	define i64 @strict_wwm_called_i64(i64 %a) noinline {			define i64 @strict_wwm_called_i64(i64 %a) noinline {
	%add = add i64 %a, %a			%add = add i64 %a, %a
	%mul = mul i64 %add, %a			%mul = mul i64 %add, %a
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1)			declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1)
	declare <2 x float> @llvm.amdgcn.raw.ptr.buffer.load.v2f32(ptr addrspace(8), i32, i32, i32)			declare <2 x float> @llvm.amdgcn.raw.ptr.buffer.load.v2f32(ptr addrspace(8), i32, i32, i32)
	declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32, i32, i32)			declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32, i32, i32)
	declare void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32, ptr addrspace(8), i32, i32, i32)			declare void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32, ptr addrspace(8), i32, i32, i32)
	declare void @llvm.amdgcn.raw.ptr.buffer.store.v2i32(<2 x i32>, ptr addrspace(8), i32, i32, i32)			declare void @llvm.amdgcn.raw.ptr.buffer.store.v2i32(<2 x i32>, ptr addrspace(8), i32, i32, i32)
	declare void @llvm.amdgcn.raw.ptr.buffer.store.v2f32(<2 x float>, ptr addrspace(8), i32, i32, i32)			declare void @llvm.amdgcn.raw.ptr.buffer.store.v2f32(<2 x float>, ptr addrspace(8), i32, i32, i32)
	declare void @llvm.amdgcn.raw.ptr.buffer.store.v4f32(<4 x float>, ptr addrspace(8), i32, i32, i32)			declare void @llvm.amdgcn.raw.ptr.buffer.store.v4f32(<4 x float>, ptr addrspace(8), i32, i32, i32)
	declare <2 x i32> @llvm.amdgcn.s.buffer.load.v2i32(<4 x i32>, i32, i32)			declare <2 x i32> @llvm.amdgcn.s.buffer.load.v2i32(<4 x i32>, i32, i32)
	declare <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32>, i32, i32)			declare <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32>, i32, i32)
				No newline at end of file

llvm/test/CodeGen/MIR/AMDGPU/stack-id-assert.mir

	# This test used to crash MIRPrinter::convertStackObjects():			# This test used to crash MIRPrinter::convertStackObjects():
	# MFI can contain some dead stack objects after PEI pass, but objects storage			# MFI can contain some dead stack objects after PEI pass, but objects storage
	# contains not dead objects only. So using objects IDs as offset in the storage			# contains not dead objects only. So using objects IDs as offset in the storage
	# caused out of bounds access.			# caused out of bounds access.

	# RUN: llc -march=amdgcn -run-pass=si-lower-sgpr-spills,prologepilog -verify-machineinstrs -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -start-before=si-lower-sgpr-spills -stop-after=prologepilog -verify-machineinstrs -o - %s \| FileCheck %s

	# CHECK-LABEL: name: foo			# CHECK-LABEL: name: foo
	# CHECK: {{^}}fixedStack: []			# CHECK: {{^}}fixedStack: []
	# CHECK: stack: []			# CHECK: stack: []

	# CHECK-LABEL: name: bar			# CHECK-LABEL: name: bar
	# CHECK: fixedStack: []			# CHECK: fixedStack: []
	# CHECK-NEXT: {{^}}stack:			# CHECK-NEXT: {{^}}stack:
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRsAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 538203

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

llvm/test/CodeGen/AMDGPU/bf16.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

llvm/test/CodeGen/AMDGPU/extend-wwm-virt-reg-liveness.mir

llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll

llvm/test/CodeGen/AMDGPU/fold-reload-into-exec.mir

llvm/test/CodeGen/AMDGPU/fold-reload-into-m0.mir

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll

llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands-non-ptr-intrinsics.ll

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/nested-calls.ll

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll

llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-dead-frame-in-dbg-value.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-partially-undef.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-vmem-large-frame.mir

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/snippet-copy-bundle-regression.mir

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

llvm/test/CodeGen/AMDGPU/spill-reg-tuple-super-reg-use.mir

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir

llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll

llvm/test/CodeGen/AMDGPU/spill-writelane-vgprs.ll

llvm/test/CodeGen/AMDGPU/spill192.mir

llvm/test/CodeGen/AMDGPU/spill224.mir

llvm/test/CodeGen/AMDGPU/spill288.mir

llvm/test/CodeGen/AMDGPU/spill320.mir

llvm/test/CodeGen/AMDGPU/spill352.mir

llvm/test/CodeGen/AMDGPU/spill384.mir

llvm/test/CodeGen/AMDGPU/stack-realign.ll

llvm/test/CodeGen/AMDGPU/swdev380865.ll

llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
AcceptedPublic