This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
5/23
SILowerSGPRSpills.cpp
1/2
SIMachineFunctionInfo.cpp
4/7
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
assert-align.ll
-
call-outgoing-stack-args.ll
-
image-waterfall-loop-O0.ll
-
localizer.ll
-
abi-attribute-hints-undefined-behavior.ll
-
branch-relax-spill.ll
-
call-alias-register-usage-agpr.ll
-
call-alias-register-usage1.ll
-
call-preserved-registers.ll
-
callee-frame-setup.ll
-
collapse-endcf.ll
-
control-flow-fastregalloc.ll
-
cross-block-use-is-not-abi-copy.ll
1/2
csr-sgpr-spill-live-ins.mir
-
dwarf-multi-register-use-crash.ll
-
flat-scratch-init.ll
-
fold-reload-into-exec.mir
-
fold-reload-into-m0.mir
-
frame-setup-without-sgpr-to-vgpr-spills.ll
-
gfx-call-non-gfx-func.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
gfx-callable-return-types.ll
-
indirect-call.ll
-
mubuf-legalize-operands.ll
-
mul24-pass-ordering.ll
1
need-fp-from-vgpr-spills.ll
-
no-source-locations-in-prologue.ll
-
partial-sgpr-to-vgpr-spills.ll
-
scc-clobbered-sgpr-to-vmem-spill.ll
-
sgpr-spill-dead-frame-in-dbg-value.mir
-
sgpr-spill-no-vgprs.ll
-
sgpr-spill-partially-undef.mir
-
sgpr-spills-split-regalloc.ll
-
si-spill-sgpr-stack.ll
-
sibling-call.ll
-
spill-csr-frame-ptr-reg-copy.ll
-
spill-reg-tuple-super-reg-use.mir
-
spill-sgpr-csr-live-ins.mir
-
spill-sgpr-stack-no-sgpr.ll
4/6
spill-sgpr-to-virtual-vgpr.mir
-
spill-writelane-vgprs.ll
-
spill192.mir
-
spill224.mir
-
tail-call-amdgpu-gfx.ll
-
unstructured-cfg-def-use-issue.ll
-
vgpr-tuple-allocation.ll
-
wwm-reserved-spill.ll

Differential D124196

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
AcceptedPublic

Authored by cdevadas on Apr 21 2022, 12:18 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
sebastian-ne

Commits

rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Summary

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

By spilling CSRs into virtual VGPR lanes, we might end up
with broken CFIs that can potentially corrupt the frame
unwinding in the debugger causing either a crash or a
terrible debugging experience. This occurs when regalloc
tries to spill or split the liverange of these virtual VGPRs.
The CFIs should also be inserted at these intermediate
points to correctly propagate the CFI entries. It is not
currently implemented in the compiler. As a short-term fix,
we continue to spill CSR SGPRs into physical VGPR lanes for
the debugger to correctly compute the unwind information.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

cdevadas added inline comments.Apr 26 2022, 9:22 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
366	I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is handled/printed by the generic part of AsmPrinter and it won't reach the target-specific emitInstruction at all.
394–395	Will do.
llvm/test/CodeGen/AMDGPU/csr-sgpr-spill-live-ins.mir
17–19	Will do.
llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
26	The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to show the tied operand. It's the simplest case.
58	This test is already hand-modified to check the tied operands.
194	I couldn't write one successfully. Will try some unstructured flow to force one.

cdevadas added inline comments.Apr 27 2022, 3:53 AM

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
194	I don't think such a case exists. A fall-through block will have only one successor and that becomes the nearest dominator for its children. It would be true even for any unstructured flow.

Fixed the review comments.
Moved UpdateLaneVGPRDomInstr lambda into a separate function.
Implemented getClearedProperties to clear certain MF properties.
Tes pre-commit + rebase.
Fixed the tied operand cases in certain tests.

Harbormaster completed remote builds in B161576: Diff 425478.Apr 27 2022, 4:33 AM

As a follow up I think we need to address the loss of being able to share VGPR lanes for unrelated spills

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
255–256	Typo "the the". It's also not necessarily unstructured
284	IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock?
llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll
202–204	This is an unfortunate regression but what I expected

arsenm added inline comments.Apr 27 2022, 2:07 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
283	As part of the follow up to allow spill slot sharing, I think we can move all of this allocation stuff out of SIMachineFunctionInfo and into SILowerSGPRSpills

cdevadas added inline comments.Apr 27 2022, 8:24 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
283	Ya, will try to move it entirely out of SIMachineFunctionInfo.

Addressed the review comments.

Harbormaster completed remote builds in B161733: Diff 425689.Apr 27 2022, 8:31 PM

cdevadas mentioned this in D124192: [AMDGPU] Callee must always spill writelane VGPRs.Jun 21 2022, 8:20 AM

Code rebase.

Herald added subscribers: kosarev, jsilvanus. · View Herald TranscriptJun 27 2022, 10:13 AM

Harbormaster completed remote builds in B172249: Diff 440294.Jun 27 2022, 10:14 AM

LGTM. Might want to introduce an asm printer flag on the implicit_def to mark it's for SGPR spills in the comment

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
258	Remove "Is there a better way to handle it?"
298	Extra ()s

This revision is now accepted and ready to land.Jun 27 2022, 5:29 PM

Should also remove the SpillSGPRToVGPR option and handling

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

In D124196#3616270, @cdevadas wrote:

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

Typo in my earlier comment. I want to do that as a separate patch.
I've identified a few more clean up that can be done while removing SpillSGPRToVGPR option.

In D124196#3616270, @cdevadas wrote:

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

Yes, that's fine

Code rebase.

Harbormaster completed remote builds in B172564: Diff 440735.Jun 28 2022, 12:59 PM

arsenm accepted this revision.Jun 28 2022, 3:39 PM

What happens when the register allocator decides to split a live range of virtual registers here, i.e. if it introduces a COPY?

cdevadas removed a parent revision: D124195: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.Jun 29 2022, 9:17 AM

In D124196#3618878, @nhaehnle wrote:

What happens when the register allocator decides to split a live range of virtual registers here, i.e. if it introduces a COPY?

This is totally broken as soon as any of these spill. We need WWM spills if they do. We should boost their priority and they need a guaranteed register to save and restore exec. I’m not sure the best way to go about this

This revision now requires changes to proceed.Jun 29 2022, 1:28 PM

Implemented WWM register spill. Reserved SGPR(s) needed for saving EXEC while manipulating the WWM spills. Included the reserved SGPRs serialization.
I couldn't reproduce the WWM COPY situation yet even after running the internal PSDB tests and hoping this patch is good to go.
Working on a follow-up patch to implement WWM Copy.

Harbormaster completed remote builds in B189651: Diff 464220.Sep 30 2022, 5:16 AM

cdevadas added a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Sep 30 2022, 5:16 AM

AFAIK, the WWM register has some unmodeled liveness behavior, which makes it impossible to allocate wwm register together with normal vector register in one pass now.
For example(a typical if-then):

bb0:
  %0 = ...
  s_cbranch_execz %bb2

bb1:
  %1 = wwm_operation
  ... = %1
  %0 = ...

bb2:
  ... = %0

VGPR %0 was dead in bb1 and WWM-VGPR %1 was defined and used in bb1. As there is no live-range conflict between them, they have a chance to get assigned the same physical register. If this happens, certain lane of %0 might be overwritten when writing to %1. I am not sure if moving the SIPreAllocateWWMRegs between the sgpr allocation and the vgpr allocation might help your case? The key point is to request the SIPreAllocateWWMRegs allocate the wwm register usage introduced in SILowerSGPRSpills.

In D124196#3829110, @ruiling wrote:
AFAIK, the WWM register has some unmodeled liveness behavior, which makes it impossible to allocate wwm register together with normal vector register in one pass now.
For example(a typical if-then):
bb0:
  %0 = ...
  s_cbranch_execz %bb2

bb1:
  %1 = wwm_operation
  ... = %1
  %0 = ...

bb2:
  ... = %0
VGPR %0 was dead in bb1 and WWM-VGPR %1 was defined and used in bb1. As there is no live-range conflict between them, they have a chance to get assigned the same physical register. If this happens, certain lane of %0 might be overwritten when writing to %1. I am not sure if moving the SIPreAllocateWWMRegs between the sgpr allocation and the vgpr allocation might help your case? The key point is to request the SIPreAllocateWWMRegs allocate the wwm register usage introduced in SILowerSGPRSpills.

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

IIRC, SIPreAllocateWWMRegs can help allocate only when we have enough free VGPRs. There is no live-range spill/split incorporated in this custom pass. It won’t help in the case of large functions with more SGPR spills.
The best approach would be to introduce another regalloc pipeline between the existing SGPR and VGPR allocations. The new pipeline should allocate only the WWM-registers.
It would, however, increase the compile time complexity further. But I’m not sure we have a better choice.

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

I think we need to make sure the idea is correct in all possible cases we can think of. The writelane/readlane shares the same behavior with WWM operation regarding to the issue here. That is: they may write to a VGPR lane that the corresponding thread is inactive. "spanning across different blocks" won't help on the problem. Even the writelane/readlane operations span across more than one thousand blocks, it can still be nested in an outer if-then structure.

cdevadas removed a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Oct 3 2022, 9:37 PM

In D124196#3829974, @ruiling wrote:

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

I think we need to make sure the idea is correct in all possible cases we can think of. The writelane/readlane shares the same behavior with WWM operation regarding to the issue here. That is: they may write to a VGPR lane that the corresponding thread is inactive. "spanning across different blocks" won't help on the problem. Even the writelane/readlane operations span across more than one thousand blocks, it can still be nested in an outer if-then structure.

Yes, we should fix this case. And we don't see a better way other than introducing a new regalloc pipeline for wwm registers alone. The effort for that is yet to be accounted and planning a follow-up patch to split the vgpr allocation.

Moved VRegFlags into AMDGPU files. Introduced the MRI delegate callbacks and used the delegate method to propagate the virtual register flags.

Herald added a subscriber: arphaman. · View Herald TranscriptOct 25 2022, 7:29 AM

Harbormaster completed remote builds in B194175: Diff 470483.Oct 25 2022, 7:30 AM

Simplified addDelegate function to reflect the recent changes made in D134950.

Harbormaster completed remote builds in B194341: Diff 470714.Oct 25 2022, 10:52 PM

Pierre-vh added a subscriber: Pierre-vh.Oct 26 2022, 1:16 AM

Pierre-vh added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1992	Why does SCC need to be dead? What happens if another instruction right after uses it?

cdevadas added inline comments.Oct 26 2022, 1:43 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1992	The code here is only to manipulate exec mask and no other instruction depends on the SCC that it produces, and we should mark it dead to avoid unwanted side effects. We don't have an alternate instruction that doesn't clobber SCC.

Pierre-vh added inline comments.Oct 26 2022, 1:50 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

1992

Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is alive?
I was trying out this patch and I have a case where it's causing issues:

S_CMP_EQ_U32 killed renamable $sgpr6, killed renamable $sgpr7, implicit-def $scc
renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr4, 4, $vgpr0(tied-def 0), implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr5, 5, $vgpr0(tied-def 0), implicit killed $sgpr4_sgpr5
$sgpr10_sgpr11 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
$agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
$exec = S_MOV_B64 killed $sgpr10_sgpr11
S_CBRANCH_SCC1 %bb.5, implicit killed $scc

Insertion is between the S_CMP and the S_CBRANCH.

cdevadas added inline comments.Oct 26 2022, 1:59 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1992	Yes, the check is already in place. See the code above, the if condition, that inserts two separate move instructions when SCC is live and the else part uses SCC when it is free. Not sure why RegScavenger returned false. It should have returned SCC as clobbered.

cdevadas added inline comments.Oct 26 2022, 2:10 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1992	See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS returned the correct liveness info for SCC.

Rebase after recent changes in D134950.

Harbormaster completed remote builds in B194472: Diff 470903.Oct 26 2022, 12:43 PM

cdevadas added a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Oct 27 2022, 11:39 PM

cdevadas mentioned this in D124195: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.Oct 28 2022, 11:34 AM

Code rebase.

Harbormaster completed remote builds in B195465: Diff 472291.Nov 1 2022, 7:10 AM

Rebase

Harbormaster completed remote builds in B195608: Diff 472479.Nov 1 2022, 7:00 PM

Pierre-vh added inline comments.Nov 2 2022, 3:37 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1351 ↗	(On Diff #472479)	I think this is missing and it's what's causing verification errors with "Using an undefined physical register" that I was talking about. The current code just tells the scavenger to enter that block but it doesn't update it to the right instruction, so eliminateFrameIndex is working with information from the start of the BB, not from the MI it's dealing with

cdevadas added inline comments.Nov 2 2022, 4:05 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1351 ↗	(On Diff #472479)	An entirely different problem and needs to be implemented separately. The code that handles the register liveness update is implemented in `PEI::replaceFrameIndices` and it tracks the loops and invokes RS->forward() appropriately to update the liveness info. I guess we should bring this code into VGPR to AGPR spill path.

Included the patch provided by @Pierre-vh to correctly update the register liveness in the RegisterScavenger during VGPR -> AGPR spilling.
This patch avoids a crash that occurred when enabled SGPR spill to virtual VGPR lanes.

diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1511,52 +1511,52 @@ void SIFrameLowering::processFunctionBeforeFrameFinalized(

                     && EnableSpillVGPRToAGPR;
                     
if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI,
                                      TRI->isAGPR(MRI, VReg))) {

// FIXME: change to enterBasicBlockEnd()
RS->enterBasicBlock(MBB);

+ RS->enterBasicBlockEnd(MBB);
+ RS->backward(MI);

TRI->eliminateFrameIndex(MI, 0, FIOp, RS);
SpillFIs.set(FI);
continue;

Included the new test llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll.

Harbormaster completed remote builds in B195744: Diff 472666.Nov 2 2022, 10:19 AM

Ping

arsenm added inline comments.Nov 14 2022, 1:32 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1349–1350 ↗	(On Diff #472666)	D137574 is in flight to invert the direction, should we land that first / separately?
llvm/lib/Target/AMDGPU/SIInstrInfo.h
628 ↗	(On Diff #472666)	static?
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
59	Is this introducing a new computation in the pass pipeline (I assume not since I don't see a pass pipeline test update)
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
688 ↗	(On Diff #472666)	Reg.isVirtual()
694 ↗	(On Diff #472666)	Reg.isVirtual()

arsenm added inline comments.Nov 14 2022, 1:32 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
490–495 ↗	(On Diff #472666)	I don't like having state here for a single operation that's happening in one pass and isn't valid for multiple uses. I don't really understand how this is being set and passed around
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
651	Isn't this always required?

cdevadas added inline comments.Nov 15 2022, 10:30 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1349–1350 ↗	(On Diff #472666)	Alex's patch has landed. But this code is still needed to update the liveness for each instruction as eliminateFrameIndex is called here.
llvm/lib/Target/AMDGPU/SIInstrInfo.h
628 ↗	(On Diff #472666)	Will change.
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
59	It isn't.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
490–495 ↗	(On Diff #472666)	CurrentVRegSpilled is needed to track the virtual register (Liverange) for which the physical register was assigned. And it is needed only for fast regalloc . We need this mapping to correctly track the WWM spills as RegAllocFast spills/restore the physical registers directly as there is no VRM. This will be appropriately set with the delegate MRI_NoteVirtualRegisterSpill which is inserted in the RegAllocFast spill/reload functions. SIMachineFunctionInfo is where the delegates are currently handled and I don't have a better place to move it.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
651	No. They are reserved only if RA inserts any whole wave spill.

Rebase + Suggestions incorporated.

Harbormaster completed remote builds in B197797: Diff 475518.Nov 15 2022, 10:45 AM

Ping

cdevadas removed a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Nov 22 2022, 12:13 PM

Rebase + Incorporated changes after D138515 to move the handling of physReg to current VirtReg mapping entirely into the generic design.

Harbormaster completed remote builds in B199233: Diff 477532.Nov 23 2022, 9:26 AM

cdevadas added a parent revision: D138517: [CodeGen] Use cloneVirtualRegister in LiveIntervals and LiveRangeEdit.Nov 23 2022, 9:26 AM

cdevadas mentioned this in D138515: [CodeGen][RegAllocFast] Map PhysReg to its current VirtReg.Nov 23 2022, 9:50 AM

Implemented the WWM spill during RegAllocFast using the additional argument to the spiller interface introduced with patch D138656.

Harbormaster completed remote builds in B199400: Diff 477752.Nov 24 2022, 5:03 AM

cdevadas mentioned this in D138656: [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot.Nov 24 2022, 5:50 AM

cdevadas removed a parent revision: D138517: [CodeGen] Use cloneVirtualRegister in LiveIntervals and LiveRangeEdit.Nov 24 2022, 5:53 AM

cdevadas added a parent revision: D138656: [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot.

rebase

Harbormaster completed remote builds in B203386: Diff 483233.Dec 15 2022, 10:38 AM

arsenm accepted this revision.Dec 15 2022, 10:45 AM

This revision is now accepted and ready to land.Dec 15 2022, 10:45 AM

This revision was landed with ongoing or failed builds.Dec 16 2022, 10:27 PM

Closed by commit rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs (authored by cdevadas). · Explain Why

This revision was automatically updated to reflect the committed changes.

cdevadas added a commit: rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

This patch causes OpenMC offloaded via OpenMP on AMDGPUs to crash at runtime. It looks like some corruption in the memory address.
You can find build instructions here: https://github.com/jtramm/openmc_offloading_builder

The commit before this one works fine though, assuming you cherry picked https://reviews.llvm.org/rGee1d000d43321590771a2f047c8c55d07d09ad28 first as it landed after.
I assume other codes will be impacted too.

@jtramm @ronlieb @jhuber6 FYI

This revision is now accepted and ready to land.Dec 19 2022, 11:14 PM

In D124196#4007017, @jdoerfert wrote:

This patch causes OpenMC offloaded via OpenMP on AMDGPUs to crash at runtime. It looks like some corruption in the memory address.
You can find build instructions here: https://github.com/jtramm/openmc_offloading_builder

The commit before this one works fine though, assuming you cherry picked https://reviews.llvm.org/rGee1d000d43321590771a2f047c8c55d07d09ad28 first as it landed after.
I assume other codes will be impacted too.

@jtramm @ronlieb @jhuber6 FYI

Thanks. Going to take a look.

cdevadas added a reverting change: rGa3028239a751: Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs".Dec 21 2022, 2:50 AM

Rebased after whole-wave copy implementation.

cdevadas added a parent revision: D143762: [AMDGPU] Enable whole wave register copy.Feb 10 2023, 10:00 AM

cdevadas mentioned this in D143754: [MachineInstr] Introduce generic predicated copy opcode.Feb 10 2023, 10:05 AM

cdevadas removed a parent revision: D143762: [AMDGPU] Enable whole wave register copy.May 8 2023, 4:36 AM

Rebased
Incorporated the downstream code

Harbormaster completed remote builds in B232828: Diff 523333.May 18 2023, 4:09 AM

yassingh added a parent revision: D143762: [AMDGPU] Enable whole wave register copy.May 18 2023, 4:12 AM

cdevadas edited the summary of this revision. (Show Details)May 18 2023, 5:06 AM

rebase

Harbormaster completed remote builds in B236912: Diff 528813.Jun 6 2023, 5:44 AM

rebase

Harbormaster completed remote builds in B239970: Diff 532865.Jun 20 2023, 4:23 AM

arsenm added inline comments.Jun 21 2023, 5:27 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	It shouldn't have been SSA to begin with ad this doesn't de-SSA
67	Add a comment explaining the new vregs?
366	You don't need to specially handle the instruction, see AsmPrinterFlags

Just a few more nits

This revision now requires changes to proceed.Jun 22 2023, 10:55 AM

yassingh added inline comments.Jun 26 2023, 4:53 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ?
366	Tried adding a new flag here D153754

Review comments

Harbormaster completed remote builds in B241203: Diff 534590.Jun 26 2023, 8:59 AM

yassingh added inline comments.Jun 26 2023, 9:07 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Removing this line works fine when running the whole pipeline as the compiler knows the code here is not in SSA form. However, when SILowerSGPRSpills and related passes are run in isolation the verifier assumes the code to be in SSA form(possibly a bug there, also we are introducing virtual vgprs maybe that's the reason). I can leave the line as it is or is there some way to update the test files to let the compiler know the input isn't SSA? I tried "isSSA: false", didn't work.

cdevadas added inline comments.Jun 26 2023, 9:41 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Seems reasonable to retain this line for now. The compiler might not be able to decide that this pass is run post phi-elimination and assume SSA form by default. There must be a serialized option to control it for MIR tests.

yassingh added inline comments.Jun 26 2023, 9:23 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't expose a way to override it.

Rebase over ancestor patch changes.

Harbormaster completed remote builds in B241700: Diff 535257.Jun 28 2023, 12:09 AM

arsenm accepted this revision.Jun 28 2023, 9:25 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1356 ↗	(On Diff #535257)	This is a pre-existing issue that should be fixed, but we should not be scanning the entire block from the end on every spill. The block iteration should be reversed and we should lazily call enterBasicBlockEnd on the first seen spill
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
66	this is kind of a mir parser bug

This revision is now accepted and ready to land.Jun 28 2023, 9:25 AM

cdevadas mentioned this in D143762: [AMDGPU] Enable whole wave register copy.Jul 4 2023, 6:57 AM

fix comment

Harbormaster completed remote builds in B243664: Diff 537989.Jul 6 2023, 11:53 PM

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:46 AM

Closed by commit rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs (authored by cdevadas, committed by yassingh). · Explain Why

This revision was automatically updated to reflect the committed changes.

yassingh added a commit: rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

Harbormaster completed remote builds in B243813: Diff 538200.Jul 7 2023, 12:52 PM

cdevadas mentioned this in D150388: [CodeGen]Allow targets to use target specific COPY instructions for live range splitting.Jul 16 2023, 12:01 PM

Still breaks OpenMC... https://github.com/llvm/llvm-project/issues/63983

This revision is now accepted and ready to land.Jul 20 2023, 9:29 AM

vitalybuka added a reverting change: D156381: Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting".Jul 26 2023, 4:00 PM

vitalybuka added a reverting change: rGa496c8be6e63: Revert "[CodeGen]Allow targets to use target specific COPY instructions for….Jul 26 2023, 10:13 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SILowerSGPRSpills.cpp

87 lines

SIMachineFunctionInfo.cpp

24 lines

SIRegisterInfo.cpp

3 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

assert-align.ll

1 line

call-outgoing-stack-args.ll

12 lines

image-waterfall-loop-O0.ll

74 lines

localizer.ll

3 lines

abi-attribute-hints-undefined-behavior.ll

1 line

branch-relax-spill.ll

8 lines

call-alias-register-usage-agpr.ll

6 lines

call-alias-register-usage1.ll

2 lines

call-preserved-registers.ll

2 lines

callee-frame-setup.ll

17 lines

collapse-endcf.ll

98 lines

control-flow-fastregalloc.ll

4 lines

cross-block-use-is-not-abi-copy.ll

4 lines

csr-sgpr-spill-live-ins.mir

33 lines

dwarf-multi-register-use-crash.ll

73 lines

flat-scratch-init.ll

88 lines

fold-reload-into-exec.mir

58 lines

fold-reload-into-m0.mir

16 lines

frame-setup-without-sgpr-to-vgpr-spills.ll

1 line

gfx-call-non-gfx-func.ll

6 lines

gfx-callable-argument-types.ll

982 lines

gfx-callable-preserved-registers.ll

148 lines

gfx-callable-return-types.ll

24 lines

indirect-call.ll

300 lines

mubuf-legalize-operands.ll

7 lines

mul24-pass-ordering.ll

43 lines

need-fp-from-vgpr-spills.ll

42 lines

no-source-locations-in-prologue.ll

24 lines

partial-sgpr-to-vgpr-spills.ll

1017 lines

scc-clobbered-sgpr-to-vmem-spill.ll

387 lines

sgpr-spill-dead-frame-in-dbg-value.mir

26 lines

sgpr-spill-no-vgprs.ll

282 lines

sgpr-spill-partially-undef.mir

14 lines

sgpr-spills-split-regalloc.ll

88 lines

si-spill-sgpr-stack.ll

6 lines

sibling-call.ll

8 lines

spill-csr-frame-ptr-reg-copy.ll

12 lines

spill-reg-tuple-super-reg-use.mir

36 lines

spill-sgpr-csr-live-ins.mir

10 lines

spill-sgpr-stack-no-sgpr.ll

23 lines

spill-sgpr-to-virtual-vgpr.mir

320 lines

spill-writelane-vgprs.ll

1 line

spill192.mir

29 lines

spill224.mir

33 lines

tail-call-amdgpu-gfx.ll

3 lines

unstructured-cfg-def-use-issue.ll

80 lines

vgpr-tuple-allocation.ll

234 lines

wwm-reserved-spill.ll

165 lines

Diff 424263

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

Show All 14 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "GCNSubtarget.h"		#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "si-lower-sgpr-spills"		#define DEBUG_TYPE "si-lower-sgpr-spills"

using MBBVector = SmallVector<MachineBasicBlock *, 4>;		using MBBVector = SmallVector<MachineBasicBlock *, 4>;

namespace {		namespace {

class SILowerSGPRSpills : public MachineFunctionPass {		class SILowerSGPRSpills : public MachineFunctionPass {
private:		private:
const SIRegisterInfo *TRI = nullptr;		const SIRegisterInfo *TRI = nullptr;
const SIInstrInfo *TII = nullptr;		const SIInstrInfo *TII = nullptr;
LiveIntervals *LIS = nullptr;		LiveIntervals *LIS = nullptr;
		MachineDominatorTree *MDT = nullptr;

// Save and Restore blocks of the current function. Typically there is a		// Save and Restore blocks of the current function. Typically there is a
// single save block, unless Windows EH funclets are involved.		// single save block, unless Windows EH funclets are involved.
MBBVector SaveBlocks;		MBBVector SaveBlocks;
MBBVector RestoreBlocks;		MBBVector RestoreBlocks;

public:		public:
static char ID;		static char ID;

SILowerSGPRSpills() : MachineFunctionPass(ID) {}		SILowerSGPRSpills() : MachineFunctionPass(ID) {}

void calculateSaveRestoreBlocks(MachineFunction &MF);		void calculateSaveRestoreBlocks(MachineFunction &MF);
bool spillCalleeSavedRegs(MachineFunction &MF);		bool spillCalleeSavedRegs(MachineFunction &MF);

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<MachineDominatorTree>();
		arsenmUnsubmitted Not Done Reply Inline Actions Is this introducing a new computation in the pass pipeline (I assume not since I don't see a pass pipeline test update) arsenm: Is this introducing a new computation in the pass pipeline (I assume not since I don't see a…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions It isn't. cdevadas: It isn't.
AU.setPreservesAll();		AU.setPreservesAll();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

		arsenmUnsubmitted Not Done Reply Inline Actions It shouldn't have been SSA to begin with ad this doesn't de-SSA arsenm: It shouldn't have been SSA to begin with ad this doesn't de-SSA
		yassinghUnsubmitted Not Done Reply Inline Actions Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ? yassingh: Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ?
		yassinghUnsubmitted Not Done Reply Inline Actions Removing this line works fine when running the whole pipeline as the compiler knows the code here is not in SSA form. However, when SILowerSGPRSpills and related passes are run in isolation the verifier assumes the code to be in SSA form(possibly a bug there, also we are introducing virtual vgprs maybe that's the reason). I can leave the line as it is or is there some way to update the test files to let the compiler know the input isn't SSA? I tried "isSSA: false", didn't work. yassingh: Removing this line works fine when running the whole pipeline as the compiler knows the code…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Seems reasonable to retain this line for now. The compiler might not be able to decide that this pass is run post phi-elimination and assume SSA form by default. There must be a serialized option to control it for MIR tests. cdevadas: Seems reasonable to retain this line for now. The compiler might not be able to decide that…
		yassinghUnsubmitted Not Done Reply Inline Actions Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't expose a way to override it. yassingh: Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't…
		arsenmUnsubmitted Not Done Reply Inline Actions this is kind of a mir parser bug arsenm: this is kind of a mir parser bug
char SILowerSGPRSpills::ID = 0;		char SILowerSGPRSpills::ID = 0;
		arsenmUnsubmitted Not Done Reply Inline Actions Add a comment explaining the new vregs? arsenm: Add a comment explaining the new vregs?

INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,
"SI lower SGPR spill instructions", false, false)		"SI lower SGPR spill instructions", false, false)
INITIALIZE_PASS_DEPENDENCY(LiveIntervals)		INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
INITIALIZE_PASS_DEPENDENCY(VirtRegMap)		INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
INITIALIZE_PASS_END(SILowerSGPRSpills, DEBUG_TYPE,		INITIALIZE_PASS_END(SILowerSGPRSpills, DEBUG_TYPE,
"SI lower SGPR spill instructions", false, false)		"SI lower SGPR spill instructions", false, false)

char &llvm::SILowerSGPRSpillsID = SILowerSGPRSpills::ID;		char &llvm::SILowerSGPRSpillsID = SILowerSGPRSpills::ID;

/// Insert restore code for the callee-saved registers used in the function.		/// Insert restore code for the callee-saved registers used in the function.
static void insertCSRSaves(MachineBasicBlock &SaveBlock,		static void insertCSRSaves(MachineBasicBlock &SaveBlock,
ArrayRef<CalleeSavedInfo> CSI,		ArrayRef<CalleeSavedInfo> CSI,
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	bool SILowerSGPRSpills::spillCalleeSavedRegs(MachineFunction &MF) {
return false;		return false;
}		}

bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {		bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

LIS = getAnalysisIfAvailable<LiveIntervals>();		LIS = getAnalysisIfAvailable<LiveIntervals>();
		MDT = &getAnalysis<MachineDominatorTree>();
		arsenmUnsubmitted Not Done Reply Inline Actions Typo "the the". It's also not necessarily unstructured arsenm: Typo "the the". It's also not necessarily unstructured
		DenseMap<Register, MachineBasicBlock::iterator> LaneVGPRDomInstr;

		arsenmUnsubmitted Not Done Reply Inline Actions Remove "Is there a better way to handle it?" arsenm: Remove "Is there a better way to handle it?"
assert(SaveBlocks.empty() && RestoreBlocks.empty());		assert(SaveBlocks.empty() && RestoreBlocks.empty());

// First, expose any CSR SGPR spills. This is mostly the same as what PEI		// First, expose any CSR SGPR spills. This is mostly the same as what PEI
// does, but somewhat simpler.		// does, but somewhat simpler.
calculateSaveRestoreBlocks(MF);		calculateSaveRestoreBlocks(MF);
bool HasCSRs = spillCalleeSavedRegs(MF);		bool HasCSRs = spillCalleeSavedRegs(MF);

MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();

if (!MFI.hasStackObjects() && !HasCSRs) {		if (!MFI.hasStackObjects() && !HasCSRs) {
SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();
return false;		return false;
}		}

bool MadeChange = false;		bool MadeChange = false;
bool NewReservedRegs = false;

// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be		// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be
// handled as SpilledToReg in regular PrologEpilogInserter.		// handled as SpilledToReg in regular PrologEpilogInserter.
const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&		const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&
(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());		(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());
if (HasSGPRSpillToVGPR) {		if (HasSGPRSpillToVGPR) {
// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs		// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs
// are spilled to VGPRs, in which case we can eliminate the stack usage.		// are spilled to VGPRs, in which case we can eliminate the stack usage.
//		//
		arsenmUnsubmitted Not Done Reply Inline Actions IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock? arsenm: IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock?
// This operates under the assumption that only other SGPR spills are users		// This operates under the assumption that only other SGPR spills are users
// of the frame index.		// of the frame index.

// To track the spill frame indices handled in this pass.		// To track the spill frame indices handled in this pass.
BitVector SpillFIs(MFI.getObjectIndexEnd(), false);		BitVector SpillFIs(MFI.getObjectIndexEnd(), false);

		auto UpdateLaneVGPRDomInstr = [&](int FI, MachineBasicBlock *MBB,
		MachineBasicBlock::iterator InsertPt) {
		arsenmUnsubmitted Not Done Reply Inline Actions Seems worthwhile for this to be its own real function instead of a lambda arsenm: Seems worthwhile for this to be its own real function instead of a lambda
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes indeed. Will do. cdevadas: Yes indeed. Will do.
		// For the Def of a virtual LaneVPGR to dominate all its uses, we should
		// insert an IMPLICIT_DEF before the dominating spill. Switching to a
		// depth first order doesn't really help since the machine function is in
		// the the unstructured control flow post-SSA. For each virtual register,
		// hence finding the common dominator to get either the dominating spill
		// or a block dominating all spills. Is there a better way to handle it?
		arsenmUnsubmitted Not Done Reply Inline Actions Extra ()s arsenm: Extra ()s
		ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills =
		FuncInfo->getSGPRToVGPRSpills(FI);
		Register PrevLaneVGPR;
		bool IsDominatesChecked = false;
		for (auto &Spill : VGPRSpills) {
		if (PrevLaneVGPR == Spill.VGPR)
		continue;

		PrevLaneVGPR = Spill.VGPR;
		auto I = LaneVGPRDomInstr.find(Spill.VGPR);
		if (Spill.Lane == 0 && I == LaneVGPRDomInstr.end()) {
		// Initially add the spill instruction itself for Insertion point.
		LaneVGPRDomInstr[Spill.VGPR] = InsertPt;
		} else {
		assert(I != LaneVGPRDomInstr.end());
		auto PrevInsertPt = I->second;
		MachineBasicBlock *DomMBB = PrevInsertPt->getParent();
		arsenmUnsubmitted Not Done Reply Inline Actions This could be the end iterator arsenm: This could be the end iterator
		if (DomMBB == MBB) {
		// The insertion point earlier selected in a predecessor block whose
		// spills are currently being lowered. The earlier InsertPt would be
		// the one just before the block terminator and it should be changed
		// if we insert any new spill in it. Check if they dominate only for
		// the first spill in case if multiple spills are inserted for the
		arsenmUnsubmitted Not Done Reply Inline Actions Typo " in case if multiple spills" arsenm: Typo " in case if multiple spills"
		// frame index.
		if (!IsDominatesChecked) {
		IsDominatesChecked = true;
		if (MDT->dominates(&InsertPt, &PrevInsertPt))
		I->second = InsertPt;
		}
		continue;
		}

		// Find the common dominator block between PrevInsertPt and the
		// current spill.
		DomMBB = MDT->findNearestCommonDominator(DomMBB, MBB);
		if (DomMBB == MBB)
		I->second = InsertPt;
		else if (DomMBB != PrevInsertPt->getParent())
		I->second = &(*DomMBB->getFirstTerminator());
		}
		}
		};

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {		for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
if (!TII->isSGPRSpill(MI))		if (!TII->isSGPRSpill(MI))
continue;		continue;

int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();		int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();
assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
		MachineInstrSpan MIS(&MI, &MBB);
if (FuncInfo->allocateSGPRSpillToVGPR(MF, FI)) {		if (FuncInfo->allocateSGPRSpillToVGPR(MF, FI)) {
NewReservedRegs = true;
bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(MI, FI,		bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(MI, FI,
nullptr, LIS);		nullptr, LIS);
(void)Spilled;		(void)Spilled;
assert(Spilled && "failed to spill SGPR to VGPR when allocated");		assert(Spilled && "failed to spill SGPR to VGPR when allocated");
SpillFIs.set(FI);		SpillFIs.set(FI);
		UpdateLaneVGPRDomInstr(FI, &MBB, MIS.begin());
}		}
}		}
}		}

// FIXME: Adding to live-ins redundant with reserving registers.		for (auto Reg : FuncInfo->getSGPRSpillVGPRs()) {
for (MachineBasicBlock &MBB : MF) {		auto InsertPt = LaneVGPRDomInstr[Reg];
for (auto Reg : FuncInfo->getSGPRSpillVGPRs())		// Insert the IMPLICIT_DEF at the identified points.
MBB.addLiveIn(Reg);		auto MIB =
MBB.sortUniqueLiveIns();		BuildMI(InsertPt->getParent(), InsertPt, InsertPt->getDebugLoc(),
		TII->get(AMDGPU::IMPLICIT_DEF), Reg);
		arsenmUnsubmitted Not Done Reply Inline Actions It might be worth adding a target comment flag for this implicit def to comment it's for SGPR spilling arsenm: It might be worth adding a target comment flag for this implicit def to comment it's for SGPR…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is handled/printed by the generic part of AsmPrinter and it won't reach the target-specific emitInstruction at all. cdevadas: I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is…
		arsenmUnsubmitted Not Done Reply Inline Actions You don't need to specially handle the instruction, see AsmPrinterFlags arsenm: You don't need to specially handle the instruction, see AsmPrinterFlags
		yassinghUnsubmitted Not Done Reply Inline Actions Tried adding a new flag here D153754 yassingh: Tried adding a new flag here D153754
		if (LIS) {
		LIS->InsertMachineInstrInMaps(*MIB);
		LIS->createAndComputeVirtRegInterval(Reg);
		}
		}

		for (MachineBasicBlock &MBB : MF) {
// FIXME: The dead frame indices are replaced with a null register from		// FIXME: The dead frame indices are replaced with a null register from
// the debug value instructions. We should instead, update it with the		// the debug value instructions. We should instead, update it with the
// correct register value. But not sure the register value alone is		// correct register value. But not sure the register value alone is
// adequate to lower the DIExpression. It should be worked out later.		// adequate to lower the DIExpression. It should be worked out later.
for (MachineInstr &MI : MBB) {		for (MachineInstr &MI : MBB) {
if (MI.isDebugValue() && MI.getOperand(0).isFI() &&		if (MI.isDebugValue() && MI.getOperand(0).isFI() &&
SpillFIs[MI.getOperand(0).getIndex()]) {		SpillFIs[MI.getOperand(0).getIndex()]) {
MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);		MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);
}		}
}		}
}		}

// All those frame indices which are dead by now should be removed from the		// All those frame indices which are dead by now should be removed from the
// function frame. Otherwise, there is a side effect such as re-mapping of		// function frame. Otherwise, there is a side effect such as re-mapping of
// free frame index ids by the later pass(es) like "stack slot coloring"		// free frame index ids by the later pass(es) like "stack slot coloring"
// which in turn could mess-up with the book keeping of "frame index to VGPR		// which in turn could mess-up with the book keeping of "frame index to VGPR
// lane".		// lane".
FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ false);		FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ false);

		// Virtual registers will get introduced and mostly with multiple defs.
		MF.getProperties().reset(MachineFunctionProperties::Property::IsSSA);
		MF.getProperties().reset(MachineFunctionProperties::Property::NoVRegs);
		arsenmUnsubmitted Not Done Reply Inline Actions Should implement MachineFunctionPass::getClearedProperties instead of clearing these here arsenm: Should implement MachineFunctionPass::getClearedProperties instead of clearing these here
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Will do. cdevadas: Will do.

MadeChange = true;		MadeChange = true;
}		}

SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();

// Updated the reserved registers with any VGPRs added for SGPR spills.
if (NewReservedRegs)
MRI.freezeReservedRegs(MF);

return MadeChange;		return MadeChange;
}		}

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	bool SIMachineFunctionInfo::isCalleeSavedReg(const MCPhysReg *CSRegs,
}		}

return false;		return false;
}		}

bool SIMachineFunctionInfo::allocateVGPRForSGPRSpills(MachineFunction &MF,		bool SIMachineFunctionInfo::allocateVGPRForSGPRSpills(MachineFunction &MF,
int FI,		int FI,
unsigned LaneIndex) {		unsigned LaneIndex) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
Register LaneVGPR;		Register LaneVGPR;
if (!LaneIndex) {		if (!LaneIndex) {
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		LaneVGPR = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
		arsenmUnsubmitted Not Done Reply Inline Actions As part of the follow up to allow spill slot sharing, I think we can move all of this allocation stuff out of SIMachineFunctionInfo and into SILowerSGPRSpills arsenm: As part of the follow up to allow spill slot sharing, I think we can move all of this…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Ya, will try to move it entirely out of SIMachineFunctionInfo. cdevadas: Ya, will try to move it entirely out of SIMachineFunctionInfo.
if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.
SGPRToVGPRSpills.erase(FI);

// FIXME: We can run out of free registers with split allocation if
// IPRA is enabled and a called function already uses every VGPR.
#if 0
DiagnosticInfoResourceLimit DiagOutOfRegs(MF.getFunction(),
"VGPRs for SGPR spilling",
0, DS_Error);
MF.getFunction().getContext().diagnose(DiagOutOfRegs);
#endif
return false;
}

SpillVGPRs.push_back(LaneVGPR);		SpillVGPRs.push_back(LaneVGPR);
// Add this register as live-in to all blocks to avoid machine verifier
// complaining about use of an undefined physical register.
for (MachineBasicBlock &BB : MF)
BB.addLiveIn(LaneVGPR);
} else {		} else {
LaneVGPR = SpillVGPRs.back();		LaneVGPR = SpillVGPRs.back();
}		}

SGPRToVGPRSpills[FI].push_back(SpilledReg(LaneVGPR, LaneIndex));		SGPRToVGPRSpills[FI].push_back(SpilledReg(LaneVGPR, LaneIndex));
return true;		return true;
}		}

▲ Show 20 Lines • Show All 420 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 642 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
if (hasBasePointer(MF)) {		if (hasBasePointer(MF)) {
MCRegister BasePtrReg = getBaseRegister();		MCRegister BasePtrReg = getBaseRegister();
reserveRegisterTuples(Reserved, BasePtrReg);		reserveRegisterTuples(Reserved, BasePtrReg);
assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));		assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));
}		}

// Reserve VGPRs/AGPRs.		// Reserve VGPRs/AGPRs.
//		//
unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);		unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);
		arsenmUnsubmitted Not Done Reply Inline Actions Isn't this always required? arsenm: Isn't this always required?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions No. They are reserved only if RA inserts any whole wave spill. cdevadas: No. They are reserved only if RA inserts any whole wave spill.
unsigned MaxNumAGPRs = MaxNumVGPRs;		unsigned MaxNumAGPRs = MaxNumVGPRs;
unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();		unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();

// Reserve all the AGPRs if there are no instructions to use it.		// Reserve all the AGPRs if there are no instructions to use it.
if (!ST.hasMAIInsts()) {		if (!ST.hasMAIInsts()) {
for (unsigned i = 0; i < MaxNumAGPRs; ++i) {		for (unsigned i = 0; i < MaxNumAGPRs; ++i) {
unsigned Reg = AMDGPU::AGPR_32RegClass.getRegister(i);		unsigned Reg = AMDGPU::AGPR_32RegClass.getRegister(i);
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {

// FIXME: Stop using reserved registers for this.		// FIXME: Stop using reserved registers for this.
for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())		for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())		for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (auto Reg : MFI->getSGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);

return Reserved;		return Reserved;
}		}

bool SIRegisterInfo::isAsmClobberable(const MachineFunction &MF,		bool SIRegisterInfo::isAsmClobberable(const MachineFunction &MF,
MCRegister PhysReg) const {		MCRegister PhysReg) const {
return !MF.getRegInfo().isReserved(PhysReg);		return !MF.getRegInfo().isReserved(PhysReg);
}		}

▲ Show 20 Lines • Show All 1,262 Lines • ▼ Show 20 Lines	void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
MachineOperand &FIOp = MI->getOperand(FIOperandNum);		MachineOperand &FIOp = MI->getOperand(FIOperandNum);
int Index = MI->getOperand(FIOperandNum).getIndex();		int Index = MI->getOperand(FIOperandNum).getIndex();

Register FrameReg = FrameInfo.isFixedObjectIndex(Index) && hasBasePointer(*MF)		Register FrameReg = FrameInfo.isFixedObjectIndex(Index) && hasBasePointer(*MF)
? getBaseRegister()		? getBaseRegister()
: getFrameRegister(*MF);		: getFrameRegister(*MF);

switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
// SGPR register spill		// SGPR register spill
		Pierre-vhUnsubmitted Not Done Reply Inline Actions Why does SCC need to be dead? What happens if another instruction right after uses it? Pierre-vh: Why does SCC need to be dead? What happens if another instruction right after uses it?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions The code here is only to manipulate exec mask and no other instruction depends on the SCC that it produces, and we should mark it dead to avoid unwanted side effects. We don't have an alternate instruction that doesn't clobber SCC. cdevadas: The code here is only to manipulate exec mask and no other instruction depends on the SCC that…
		Pierre-vhUnsubmitted Not Done Reply Inline Actions Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is alive? I was trying out this patch and I have a case where it's causing issues: S_CMP_EQ_U32 killed renamable $sgpr6, killed renamable $sgpr7, implicit-def $scc renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr4, 4, $vgpr0(tied-def 0), implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5 renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr5, 5, $vgpr0(tied-def 0), implicit killed $sgpr4_sgpr5 $sgpr10_sgpr11 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec $agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec $exec = S_MOV_B64 killed $sgpr10_sgpr11 S_CBRANCH_SCC1 %bb.5, implicit killed $scc Insertion is between the S_CMP and the S_CBRANCH. Pierre-vh: Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, the check is already in place. See the code above, the if condition, that inserts two separate move instructions when SCC is live and the else part uses SCC when it is free. Not sure why RegScavenger returned false. It should have returned SCC as clobbered. cdevadas: Yes, the check is already in place. See the code above, the if condition, that inserts two…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS returned the correct liveness info for SCC. cdevadas: See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS…
case AMDGPU::SI_SPILL_S1024_SAVE:		case AMDGPU::SI_SPILL_S1024_SAVE:
case AMDGPU::SI_SPILL_S512_SAVE:		case AMDGPU::SI_SPILL_S512_SAVE:
case AMDGPU::SI_SPILL_S256_SAVE:		case AMDGPU::SI_SPILL_S256_SAVE:
case AMDGPU::SI_SPILL_S224_SAVE:		case AMDGPU::SI_SPILL_S224_SAVE:
case AMDGPU::SI_SPILL_S192_SAVE:		case AMDGPU::SI_SPILL_S192_SAVE:
case AMDGPU::SI_SPILL_S160_SAVE:		case AMDGPU::SI_SPILL_S160_SAVE:
case AMDGPU::SI_SPILL_S128_SAVE:		case AMDGPU::SI_SPILL_S128_SAVE:
case AMDGPU::SI_SPILL_S96_SAVE:		case AMDGPU::SI_SPILL_S96_SAVE:
▲ Show 20 Lines • Show All 1,090 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s

	declare hidden i32 addrspace(1)* @ext(i8 addrspace(1)*)			declare hidden i32 addrspace(1)* @ext(i8 addrspace(1)*)

	define i32 addrspace(1)* @call_assert_align() {			define i32 addrspace(1)* @call_assert_align() {
	; CHECK-LABEL: call_assert_align:			; CHECK-LABEL: call_assert_align:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
				; CHECK-NEXT: ; implicit-def: $vgpr40
	; CHECK-NEXT: v_writelane_b32 v41, s33, 0			; CHECK-NEXT: v_writelane_b32 v41, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	Show All 36 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: v_writelane_b32 v41, s33, 0			; MUBUF-NEXT: v_writelane_b32 v41, s33, 0
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_mov_b32_e32 v0, 9			; MUBUF-NEXT: v_mov_b32_e32 v0, 9
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; MUBUF-NEXT: v_mov_b32_e32 v0, 10			; MUBUF-NEXT: v_mov_b32_e32 v0, 10
				; MUBUF-NEXT: ; implicit-def: $vgpr40
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; MUBUF-NEXT: v_mov_b32_e32 v0, 11			; MUBUF-NEXT: v_mov_b32_e32 v0, 11
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; MUBUF-NEXT: v_mov_b32_e32 v0, 12			; MUBUF-NEXT: v_mov_b32_e32 v0, 12
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	Show All 19 Lines
	; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0			; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 9			; FLATSCR-NEXT: v_mov_b32_e32 v0, 9
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 10			; FLATSCR-NEXT: v_mov_b32_e32 v0, 10
				; FLATSCR-NEXT: ; implicit-def: $vgpr40
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 11			; FLATSCR-NEXT: v_mov_b32_e32 v0, 11
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 12			; FLATSCR-NEXT: v_mov_b32_e32 v0, 12
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	Show All 22 Lines
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen
	; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4
	; MUBUF-NEXT: v_writelane_b32 v41, s33, 0			; MUBUF-NEXT: v_writelane_b32 v41, s33, 0
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: ; implicit-def: $vgpr40
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12
				; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
				; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4
	; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen offset:8			; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen offset:8
	; MUBUF-NEXT: s_nop 0			; MUBUF-NEXT: s_nop 0
	; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:12			; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:12
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off
	; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0			; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: ; implicit-def: $vgpr40
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12
				; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
				; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off offset:8			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off offset:8
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:8			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:8
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off offset:16			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off offset:16
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:16			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:16
	Show All 35 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 -verify-machineinstrs -o - %s \| FileCheck %s

	; Make sure the waterfall loop does not fail the verifier after regalloc fast			; Make sure the waterfall loop does not fail the verifier after regalloc fast
	define <4 x float> @waterfall_loop(<8 x i32> %vgpr_srd) {			define <4 x float> @waterfall_loop(<8 x i32> %vgpr_srd) {
	; CHECK-LABEL: waterfall_loop:			; CHECK-LABEL: waterfall_loop:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0			; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: s_or_saveexec_b32 s4, -1			; CHECK-NEXT: s_or_saveexec_b32 s4, -1
	; CHECK-NEXT: buffer_store_dword v15, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:80 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:84 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: v_mov_b32_e32 v14, v1			; CHECK-NEXT: v_mov_b32_e32 v14, v1
	; CHECK-NEXT: v_mov_b32_e32 v13, v2			; CHECK-NEXT: v_mov_b32_e32 v13, v2
	; CHECK-NEXT: v_mov_b32_e32 v12, v3			; CHECK-NEXT: v_mov_b32_e32 v12, v3
	; CHECK-NEXT: v_mov_b32_e32 v11, v4			; CHECK-NEXT: v_mov_b32_e32 v11, v4
	; CHECK-NEXT: v_mov_b32_e32 v10, v5			; CHECK-NEXT: v_mov_b32_e32 v10, v5
	; CHECK-NEXT: v_mov_b32_e32 v9, v6			; CHECK-NEXT: v_mov_b32_e32 v9, v6
	; CHECK-NEXT: v_mov_b32_e32 v8, v7			; CHECK-NEXT: v_mov_b32_e32 v8, v7
	Show All 13 Lines
	; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:60 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:64 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:68 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 s8, 0			; CHECK-NEXT: s_mov_b32 s8, 0
	; CHECK-NEXT: s_mov_b32 s4, s8			; CHECK-NEXT: s_mov_b32 s4, s8
	; CHECK-NEXT: s_mov_b32 s5, s8			; CHECK-NEXT: s_mov_b32 s5, s8
	; CHECK-NEXT: s_mov_b32 s6, s8			; CHECK-NEXT: s_mov_b32 s6, s8
	; CHECK-NEXT: s_mov_b32 s7, s8			; CHECK-NEXT: s_mov_b32 s7, s8
	; CHECK-NEXT: v_writelane_b32 v15, s4, 0			; CHECK-NEXT: ; implicit-def: $vgpr8
	; CHECK-NEXT: v_writelane_b32 v15, s5, 1			; CHECK-NEXT: v_writelane_b32 v8, s4, 0
	; CHECK-NEXT: v_writelane_b32 v15, s6, 2			; CHECK-NEXT: v_writelane_b32 v8, s5, 1
	; CHECK-NEXT: v_writelane_b32 v15, s7, 3			; CHECK-NEXT: v_writelane_b32 v8, s6, 2
				; CHECK-NEXT: v_writelane_b32 v8, s7, 3
				; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 s6, 0			; CHECK-NEXT: s_mov_b32 s6, 0
	; CHECK-NEXT: s_mov_b32 s4, s6			; CHECK-NEXT: s_mov_b32 s4, s6
	; CHECK-NEXT: s_mov_b32 s5, s6			; CHECK-NEXT: s_mov_b32 s5, s6
	; CHECK-NEXT: v_mov_b32_e32 v9, s5			; CHECK-NEXT: v_mov_b32_e32 v9, s5
	; CHECK-NEXT: v_mov_b32_e32 v8, s4			; CHECK-NEXT: v_mov_b32_e32 v8, s4
	; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v9, v1			; CHECK-NEXT: v_mov_b32_e32 v9, v1
	; CHECK-NEXT: v_mov_b32_e32 v8, v0			; CHECK-NEXT: v_mov_b32_e32 v8, v0
	; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v9, v3			; CHECK-NEXT: v_mov_b32_e32 v9, v3
	; CHECK-NEXT: v_mov_b32_e32 v8, v2			; CHECK-NEXT: v_mov_b32_e32 v8, v2
	; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v9, v5			; CHECK-NEXT: v_mov_b32_e32 v9, v5
	; CHECK-NEXT: v_mov_b32_e32 v8, v4			; CHECK-NEXT: v_mov_b32_e32 v8, v4
	; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v0, v6			; CHECK-NEXT: v_mov_b32_e32 v0, v6
	; CHECK-NEXT: v_mov_b32_e32 v1, v7			; CHECK-NEXT: v_mov_b32_e32 v1, v7
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b32 s4, exec_lo			; CHECK-NEXT: s_mov_b32 s4, exec_lo
	; CHECK-NEXT: v_writelane_b32 v15, s4, 4			; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_writelane_b32 v0, s4, 4
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; CHECK-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	Show All 16 Lines
	; CHECK-NEXT: s_mov_b32 s11, s7			; CHECK-NEXT: s_mov_b32 s11, s7
	; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[10:11], v[2:3]			; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[10:11], v[2:3]
	; CHECK-NEXT: s_and_b32 s9, s4, s5			; CHECK-NEXT: s_and_b32 s9, s4, s5
	; CHECK-NEXT: v_readfirstlane_b32 s6, v0			; CHECK-NEXT: v_readfirstlane_b32 s6, v0
	; CHECK-NEXT: v_readfirstlane_b32 s5, v1			; CHECK-NEXT: v_readfirstlane_b32 s5, v1
	; CHECK-NEXT: s_mov_b32 s10, s6			; CHECK-NEXT: s_mov_b32 s10, s6
	; CHECK-NEXT: s_mov_b32 s11, s5			; CHECK-NEXT: s_mov_b32 s11, s5
	; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[10:11], v[0:1]			; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[10:11], v[0:1]
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
	; CHECK-NEXT: s_and_b32 s4, s4, s9			; CHECK-NEXT: s_and_b32 s4, s4, s9
	; CHECK-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15			; CHECK-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
	; CHECK-NEXT: s_mov_b32 s9, s19			; CHECK-NEXT: s_mov_b32 s9, s19
	; CHECK-NEXT: s_mov_b32 s10, s18			; CHECK-NEXT: s_mov_b32 s10, s18
	; CHECK-NEXT: s_mov_b32 s11, s17			; CHECK-NEXT: s_mov_b32 s11, s17
	; CHECK-NEXT: s_mov_b32 s12, s16			; CHECK-NEXT: s_mov_b32 s12, s16
	; CHECK-NEXT: s_mov_b32 s13, s7			; CHECK-NEXT: s_mov_b32 s13, s7
	; CHECK-NEXT: s_mov_b32 s14, s6			; CHECK-NEXT: s_mov_b32 s14, s6
	; CHECK-NEXT: s_mov_b32 s15, s5			; CHECK-NEXT: s_mov_b32 s15, s5
	; CHECK-NEXT: v_writelane_b32 v15, s8, 5			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_writelane_b32 v15, s9, 6			; CHECK-NEXT: v_writelane_b32 v0, s8, 5
	; CHECK-NEXT: v_writelane_b32 v15, s10, 7			; CHECK-NEXT: v_writelane_b32 v0, s9, 6
	; CHECK-NEXT: v_writelane_b32 v15, s11, 8			; CHECK-NEXT: v_writelane_b32 v0, s10, 7
	; CHECK-NEXT: v_writelane_b32 v15, s12, 9			; CHECK-NEXT: v_writelane_b32 v0, s11, 8
	; CHECK-NEXT: v_writelane_b32 v15, s13, 10			; CHECK-NEXT: v_writelane_b32 v0, s12, 9
	; CHECK-NEXT: v_writelane_b32 v15, s14, 11			; CHECK-NEXT: v_writelane_b32 v0, s13, 10
	; CHECK-NEXT: v_writelane_b32 v15, s15, 12			; CHECK-NEXT: v_writelane_b32 v0, s14, 11
				; CHECK-NEXT: v_writelane_b32 v0, s15, 12
	; CHECK-NEXT: s_and_saveexec_b32 s4, s4			; CHECK-NEXT: s_and_saveexec_b32 s4, s4
	; CHECK-NEXT: v_writelane_b32 v15, s4, 13			; CHECK-NEXT: v_writelane_b32 v0, s4, 13
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Spill
	; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: v_readlane_b32 s4, v15, 13			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_readlane_b32 s4, v2, 13
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; CHECK-NEXT: v_readlane_b32 s8, v15, 5			; CHECK-NEXT: v_readlane_b32 s8, v2, 5
	; CHECK-NEXT: v_readlane_b32 s9, v15, 6			; CHECK-NEXT: v_readlane_b32 s9, v2, 6
	; CHECK-NEXT: v_readlane_b32 s10, v15, 7			; CHECK-NEXT: v_readlane_b32 s10, v2, 7
	; CHECK-NEXT: v_readlane_b32 s11, v15, 8			; CHECK-NEXT: v_readlane_b32 s11, v2, 8
	; CHECK-NEXT: v_readlane_b32 s12, v15, 9			; CHECK-NEXT: v_readlane_b32 s12, v2, 9
	; CHECK-NEXT: v_readlane_b32 s13, v15, 10			; CHECK-NEXT: v_readlane_b32 s13, v2, 10
	; CHECK-NEXT: v_readlane_b32 s14, v15, 11			; CHECK-NEXT: v_readlane_b32 s14, v2, 11
	; CHECK-NEXT: v_readlane_b32 s15, v15, 12			; CHECK-NEXT: v_readlane_b32 s15, v2, 12
	; CHECK-NEXT: v_readlane_b32 s16, v15, 0			; CHECK-NEXT: v_readlane_b32 s16, v2, 0
	; CHECK-NEXT: v_readlane_b32 s17, v15, 1			; CHECK-NEXT: v_readlane_b32 s17, v2, 1
	; CHECK-NEXT: v_readlane_b32 s18, v15, 2			; CHECK-NEXT: v_readlane_b32 s18, v2, 2
	; CHECK-NEXT: v_readlane_b32 s19, v15, 3			; CHECK-NEXT: v_readlane_b32 s19, v2, 3
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: image_sample v0, v[0:1], s[8:15], s[16:19] dmask:0x1 dim:SQ_RSRC_IMG_2D			; CHECK-NEXT: image_sample v0, v[0:1], s[8:15], s[16:19] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Spill
	; CHECK-NEXT: s_xor_b32 exec_lo, exec_lo, s4			; CHECK-NEXT: s_xor_b32 exec_lo, exec_lo, s4
	; CHECK-NEXT: s_cbranch_execnz .LBB0_1			; CHECK-NEXT: s_cbranch_execnz .LBB0_1
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: v_readlane_b32 s4, v15, 4			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_readlane_b32 s4, v0, 4
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: ; %bb.4:			; CHECK-NEXT: ; %bb.4:
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:72 ; 4-byte Folded Reload
	; CHECK-NEXT: ; implicit-def: $sgpr4			; CHECK-NEXT: ; implicit-def: $sgpr4
	; CHECK-NEXT: v_mov_b32_e32 v1, s4			; CHECK-NEXT: v_mov_b32_e32 v1, s4
	; CHECK-NEXT: v_mov_b32_e32 v2, s4			; CHECK-NEXT: v_mov_b32_e32 v2, s4
	; CHECK-NEXT: v_mov_b32_e32 v3, s4			; CHECK-NEXT: v_mov_b32_e32 v3, s4
	; CHECK-NEXT: s_or_saveexec_b32 s4, -1			; CHECK-NEXT: s_or_saveexec_b32 s4, -1
	; CHECK-NEXT: buffer_load_dword v15, off, s[0:3], s32 offset:76 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:84 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0			; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%ret = tail call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %vgpr_srd, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)			%ret = tail call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %vgpr_srd, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
	ret <4 x float> %ret			ret <4 x float> %ret
	}			}

	declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0			declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0

	attributes #0 = { nounwind readonly willreturn }			attributes #0 = { nounwind readonly willreturn }

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

	Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_or_saveexec_b64 s[16:17], -1			; GFX9-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[16:17]			; GFX9-NEXT: s_mov_b64 exec, s[16:17]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dword v0, v[0:1], off glc			; GFX9-NEXT: global_load_dword v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr40
	; GFX9-NEXT: v_writelane_b32 v41, s33, 0			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
				; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], 0			; GFX9-NEXT: s_swappc_b64 s[30:31], 0
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v41, 0			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	Show All 14 Lines

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

	Show All 15 Lines
	define void @parent_func_missing_inputs() #0 {			define void @parent_func_missing_inputs() #0 {
	; FIXEDABI-LABEL: parent_func_missing_inputs:			; FIXEDABI-LABEL: parent_func_missing_inputs:
	; FIXEDABI: ; %bb.0:			; FIXEDABI: ; %bb.0:
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[16:17], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[16:17], -1
	; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; FIXEDABI-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: s_mov_b64 exec, s[16:17]			; FIXEDABI-NEXT: s_mov_b64 exec, s[16:17]
				; FIXEDABI-NEXT: ; implicit-def: $vgpr40
	; FIXEDABI-NEXT: v_writelane_b32 v41, s33, 0			; FIXEDABI-NEXT: v_writelane_b32 v41, s33, 0
	; FIXEDABI-NEXT: s_mov_b32 s33, s32			; FIXEDABI-NEXT: s_mov_b32 s33, s32
	; FIXEDABI-NEXT: s_addk_i32 s32, 0x400			; FIXEDABI-NEXT: s_addk_i32 s32, 0x400
	; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0			; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0
	; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1			; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1
	; FIXEDABI-NEXT: s_getpc_b64 s[16:17]			; FIXEDABI-NEXT: s_getpc_b64 s[16:17]
	; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4			; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4
	; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12			; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12
	▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll

	Show First 20 Lines • Show All 896 Lines • ▼ Show 20 Lines
	define void @spill_func(i32 addrspace(1)* %arg) #0 {			define void @spill_func(i32 addrspace(1)* %arg) #0 {
	; CHECK-LABEL: spill_func:			; CHECK-LABEL: spill_func:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
				; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: ; implicit-def: $vgpr1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: s_waitcnt expcnt(1)			; CHECK-NEXT: s_waitcnt expcnt(1)
	; CHECK-NEXT: v_writelane_b32 v0, s30, 0			; CHECK-NEXT: v_writelane_b32 v0, s30, 0
	; CHECK-NEXT: v_writelane_b32 v0, s31, 1			; CHECK-NEXT: v_writelane_b32 v0, s31, 1
	; CHECK-NEXT: v_writelane_b32 v0, s33, 2			; CHECK-NEXT: v_writelane_b32 v0, s33, 2
	; CHECK-NEXT: v_writelane_b32 v0, s34, 3			; CHECK-NEXT: v_writelane_b32 v0, s34, 3
	; CHECK-NEXT: v_writelane_b32 v0, s35, 4			; CHECK-NEXT: v_writelane_b32 v0, s35, 4
	; CHECK-NEXT: v_writelane_b32 v0, s36, 5			; CHECK-NEXT: v_writelane_b32 v0, s36, 5
	; CHECK-NEXT: v_writelane_b32 v0, s37, 6			; CHECK-NEXT: v_writelane_b32 v0, s37, 6
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: v_writelane_b32 v1, s99, 4			; CHECK-NEXT: v_writelane_b32 v1, s99, 4
	; CHECK-NEXT: v_writelane_b32 v0, s93, 62			; CHECK-NEXT: v_writelane_b32 v0, s93, 62
	; CHECK-NEXT: v_writelane_b32 v1, s100, 5			; CHECK-NEXT: v_writelane_b32 v1, s100, 5
	; CHECK-NEXT: s_mov_b32 s31, s12			; CHECK-NEXT: s_mov_b32 s31, s12
	; CHECK-NEXT: v_writelane_b32 v0, s94, 63			; CHECK-NEXT: v_writelane_b32 v0, s94, 63
	; CHECK-NEXT: v_writelane_b32 v1, s101, 6			; CHECK-NEXT: v_writelane_b32 v1, s101, 6
	; CHECK-NEXT: s_cmp_eq_u32 s31, 0			; CHECK-NEXT: s_cmp_eq_u32 s31, 0
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s1, 0			; CHECK-NEXT: s_mov_b32 s1, 0
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s2, 0			; CHECK-NEXT: s_mov_b32 s2, 0
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s3, 0			; CHECK-NEXT: s_mov_b32 s3, 0
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 951 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s \| FileCheck -check-prefixes=ALL,GFX908 %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s \| FileCheck -check-prefixes=ALL,GFX908 %s
	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s \| FileCheck -check-prefixes=ALL,GFX90A %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s \| FileCheck -check-prefixes=ALL,GFX90A %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias = hidden alias void (), void ()* @aliasee_default			@alias = hidden alias void (), void ()* @aliasee_default

	; ALL-LABEL: {{^}}kernel:			; ALL-LABEL: {{^}}kernel:
	; GFX908: .amdhsa_next_free_vgpr 41			; GFX908: .amdhsa_next_free_vgpr 32
	; GFX908-NEXT: .amdhsa_next_free_sgpr 33			; GFX908-NEXT: .amdhsa_next_free_sgpr 33

	; GFX90A: .amdhsa_next_free_vgpr 71			; GFX90A: .amdhsa_next_free_vgpr 59
	; GFX90A-NEXT: .amdhsa_next_free_sgpr 33			; GFX90A-NEXT: .amdhsa_next_free_sgpr 33
	; GFX90A-NEXT: .amdhsa_accum_offset 44			; GFX90A-NEXT: .amdhsa_accum_offset 32
	define amdgpu_kernel void @kernel() #0 {			define amdgpu_kernel void @kernel() #0 {
	bb:			bb:
	call void @alias() #2			call void @alias() #2
	ret void			ret void
	}			}

	define internal void @aliasee_default() #1 {			define internal void @aliasee_default() #1 {
	bb:			bb:
	call void asm sideeffect "; clobber a26 ", "~{a26}"()			call void asm sideeffect "; clobber a26 ", "~{a26}"()
	ret void			ret void
	}			}

	attributes #0 = { noinline norecurse nounwind optnone }			attributes #0 = { noinline norecurse nounwind optnone }
	attributes #1 = { noinline norecurse nounwind readnone willreturn }			attributes #1 = { noinline norecurse nounwind readnone willreturn }
	attributes #2 = { nounwind readnone willreturn }			attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias1 = hidden alias void (), void ()* @aliasee_vgpr32_sgpr76			@alias1 = hidden alias void (), void ()* @aliasee_vgpr32_sgpr76

	; The parent kernel has a higher VGPR usage than the possible callees.			; The parent kernel has a higher VGPR usage than the possible callees.

	; CHECK-LABEL: {{^}}kernel1:			; CHECK-LABEL: {{^}}kernel1:
	; CHECK: .amdhsa_next_free_vgpr 42			; CHECK: .amdhsa_next_free_vgpr 41
	; CHECK-NEXT: .amdhsa_next_free_sgpr 33			; CHECK-NEXT: .amdhsa_next_free_sgpr 33
	define amdgpu_kernel void @kernel1() #0 {			define amdgpu_kernel void @kernel1() #0 {
	bb:			bb:
	call void asm sideeffect "; clobber v40 ", "~{v40}"()			call void asm sideeffect "; clobber v40 ", "~{v40}"()
	call void @alias1() #2			call void @alias1() #2
	ret void			ret void
	}			}

	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

Show All 20 Lines	define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; MUBUF: buffer_store_dword		; MUBUF: buffer_store_dword
; MUBUF: buffer_store_dword		; MUBUF: buffer_store_dword
; FLATSCR: scratch_store_dword		; FLATSCR: scratch_store_dword
; FLATSCR: scratch_store_dword		; FLATSCR: scratch_store_dword
		; GCN: v_writelane_b32 v41, s33, 0
; GCN: v_writelane_b32 v40, s30, 0		; GCN: v_writelane_b32 v40, s30, 0
; GCN: v_writelane_b32 v40, s31, 1		; GCN: v_writelane_b32 v40, s31, 1
; GCN: v_writelane_b32 v41, s33, 0
; GCN: v_writelane_b32 v40, s34, 2		; GCN: v_writelane_b32 v40, s34, 2
; GCN: v_writelane_b32 v40, s35, 3		; GCN: v_writelane_b32 v40, s35, 3

; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN: v_readlane_b32 s35, v40, 3		; GCN: v_readlane_b32 s35, v40, 3
▲ Show 20 Lines • Show All 316 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
; enable all lanes and restore.		; enable all lanes and restore.

; GCN-LABEL: {{^}}spill_only_csr_sgpr:		; GCN-LABEL: {{^}}spill_only_csr_sgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64		; GCN-NEXT: s_or_saveexec_b64
; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, v0, s32 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, v0, s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec,		; GCN-NEXT: s_mov_b64 exec,
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v0, s42, 0		; GCN-NEXT: v_writelane_b32 v0, s42, 0
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; clobber s42		; GCN-NEXT: ; clobber s42
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s42, v0, 0		; GCN-NEXT: v_readlane_b32 s42, v0, 0
; GCN-NEXT: s_or_saveexec_b64		; GCN-NEXT: s_or_saveexec_b64
; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword v0, off, s32 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword v0, off, s32 ; 4-byte Folded Reload
Show All 39 Lines

; Use a copy to a free SGPR instead of introducing a second CSR VGPR.		; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:		; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-COUNT-60: v_writelane_b32 v0
; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
		; GCN-COUNT-60: v_writelane_b32 v0
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0
; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:4		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:4
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:4		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:4
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
Show All 28 Lines

; Use a copy to a free SGPR instead of introducing a second CSR VGPR.		; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:		; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-COUNT-61: v_writelane_b32 v0,		; MUBUF: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; FLATSCR: s_mov_b32 [[FP_COPY:s[0-9]+]], s33		; FLATSCR: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
		; GCN-COUNT-61: v_writelane_b32 v0,
; FLATSCR-NEXT: s_mov_b32 s33, s32		; FLATSCR-NEXT: s_mov_b32 s33, s32
; MUBUF: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; MUBUF-NEXT: s_mov_b32 s33, s32		; MUBUF-NEXT: s_mov_b32 s33, s32
; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GCN: v_writelane_b32 v0,		; GCN: v_writelane_b32 v0,
; MUBUF: buffer_store_dword		; MUBUF: buffer_store_dword
; FLATSCR: scratch_store_dword		; FLATSCR: scratch_store_dword
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_writelane_b32 v0,		; GCN: v_writelane_b32 v0,
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
}		}

; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_mov_b32 vcc_lo, s33		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; MUBUF: s_addk_i32 s32, 0x300		; MUBUF: s_addk_i32 s32, 0x300
; FLATSCR: s_add_i32 s32, s32, 12		; FLATSCR: s_add_i32 s32, s32, 12
; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1
; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
Show All 27 Lines

; Need a new CSR VGPR to satisfy the FP spill.		; Need a new CSR VGPR to satisfy the FP spill.
; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
		; GCN-NEXT: ; implicit-def: $vgpr48
; GCN-NEXT: s_mov_b32 vcc_lo, s33		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; MUBUF: s_addk_i32 s32, 0x300{{$}}		; MUBUF: s_addk_i32 s32, 0x300{{$}}
; FLATSCR: s_add_i32 s32, s32, 12{{$}}		; FLATSCR: s_add_i32 s32, s32, 12{{$}}

; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

Show All 32 Lines
; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:		; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
		; MUBUF-NEXT: ; implicit-def: $vgpr48
; GCN-NEXT: s_mov_b32 vcc_lo, s33		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}		; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}
; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}		; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}
		; FLATSCR-DAG: ; implicit-def: $vgpr48
; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}		; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}
; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}		; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}
; GCN-NEXT: s_mov_b32 s33, vcc_lo		; GCN-NEXT: s_mov_b32 s33, vcc_lo
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	define void @callee_need_to_spill_fp_to_memory_full_reserved_vgpr() #3 {
ret void		ret void
}		}

; When flat-scratch is enabled, we save the FP to s0. At the same time,		; When flat-scratch is enabled, we save the FP to s0. At the same time,
; the exec register is saved to s0 when saving CSR in the function prolog.		; the exec register is saved to s0 when saving CSR in the function prolog.
; Make sure that the FP save happens after restoring exec from the same		; Make sure that the FP save happens after restoring exec from the same
; register.		; register.
; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_reg:		; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_reg:
; GCN-NOT: v_writelane_b32 v40, s33		; GCN-NOT: v_writelane_b32 v48, s33
; FLATSCR: s_or_saveexec_b64 s[0:1], -1		; FLATSCR: s_or_saveexec_b64 s[0:1], -1
; FLATSCR: s_mov_b64 exec, s[0:1]		; FLATSCR: s_mov_b64 exec, s[0:1]
; FLATSCR: s_mov_b32 s0, s33		; FLATSCR: s_mov_b32 s0, s33
; FLATSCR: s_mov_b32 s33, s32		; FLATSCR: s_mov_b32 s33, s32
; FLATSCR: s_mov_b32 s33, s0		; FLATSCR: s_mov_b32 s33, s0
; FLATSCR: s_or_saveexec_b64 s[0:1], -1		; FLATSCR: s_or_saveexec_b64 s[0:1], -1
; GCN-NOT: v_readlane_b32 s33, v40		; GCN-NOT: v_readlane_b32 s33, v48
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @callee_need_to_spill_fp_to_reg() #1 {		define void @callee_need_to_spill_fp_to_reg() #1 {
call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}		,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}		,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}		,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}		,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}
Show All 14 Lines
; If the size of the offset exceeds the MUBUF offset field we need another		; If the size of the offset exceeds the MUBUF offset field we need another
; scratch VGPR to hold the offset.		; scratch VGPR to hold the offset.
; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset		; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset
; MUBUF: s_or_saveexec_b64 s[4:5], -1		; MUBUF: s_or_saveexec_b64 s[4:5], -1
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100
; MUBUF-NEXT: buffer_store_dword v39, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v39, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; MUBUF: v_mov_b32_e32 v0, s33		; MUBUF: v_mov_b32_e32 v0, s33
; GCN-NOT: v_mov_b32_e32 v0, 0x100c		; GCN-NOT: v_mov_b32_e32 v0, 0x100c
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200		; MUBUF-DAG: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200
; MUBUF: buffer_store_dword v0, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v0, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR: v_mov_b32_e32 v0, 0		; FLATSCR: v_mov_b32_e32 v0, 0
; FLATSCR: s_add_i32 [[SOFF:s[0-9]+]], s33, 0x1000		; FLATSCR: s_add_i32 [[SOFF:s[0-9]+]], s33, 0x1000
; FLATSCR: scratch_store_dword off, v0, [[SOFF]]		; FLATSCR: scratch_store_dword off, v0, [[SOFF]]
define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {		define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

Show All 24 Lines

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

	Show All 13 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}simple_nested_if:			; GCN-O0-LABEL: {{^}}simple_nested_if:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:			; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}uncollapsable_nested_if:			; GCN-O0-LABEL: {{^}}uncollapsable_nested_if:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]			; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]
	; GCN-O0-NEXT: {{^}}[[ENDIF_OUTER]]:			; GCN-O0-NEXT: {{^}}[[ENDIF_OUTER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}nested_if_if_else:			; GCN-O0-LABEL: {{^}}nested_if_if_else:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-DAG: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[THEN_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[THEN_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: s_branch [[TEMP_BB:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_branch [[TEMP_BB:.LBB[0-9_]+]]
	; GCN-O0-NEXT: {{^}}[[THEN_INNER]]:			; GCN-O0-NEXT: {{^}}[[THEN_INNER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_xor_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-DAG: s_xor_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]			; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]
	; GCN-O0-NEXT: {{^}}[[TEMP_BB]]:			; GCN-O0-NEXT: {{^}}[[TEMP_BB]]:
	; GCN-O0: s_branch [[THEN_INNER]]			; GCN-O0: s_branch [[THEN_INNER]]
	; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:			; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}nested_if_else_if:			; GCN-O0-LABEL: {{^}}nested_if_else_if:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-DAG: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: s_branch [[INNER_IF_OUTER_ELSE:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_branch [[INNER_IF_OUTER_ELSE:.LBB[0-9_]+]]
	; GCN-O0-NEXT: {{^}}[[THEN_OUTER]]:			; GCN-O0-NEXT: {{^}}[[THEN_OUTER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_xor_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-DAG: s_xor_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[FLOW1:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[FLOW1:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: s_branch [[FLOW1]]			; GCN-O0-NEXT: s_branch [[FLOW1]]
	; GCN-O0-NEXT: {{^}}[[INNER_IF_OUTER_ELSE]]			; GCN-O0-NEXT: {{^}}[[INNER_IF_OUTER_ELSE]]
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER_FLOW:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER_FLOW:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: {{^}}[[THEN_OUTER_FLOW]]			; GCN-O0-NEXT: {{^}}[[THEN_OUTER_FLOW]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]
	; GCN: s_barrier			; GCN: s_barrier
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}s_endpgm_unsafe_barrier:			; GCN-O0-LABEL: {{^}}s_endpgm_unsafe_barrier:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: {{^}}[[ENDIF]]:			; GCN-O0-NEXT: {{^}}[[ENDIF]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	;			;
	; GCN-O0-LABEL: {{^}}scc_liveness:			; GCN-O0-LABEL: {{^}}scc_liveness:
	; GCN-O0-COUNT-2: buffer_store_dword			; GCN-O0-COUNT-2: buffer_store_dword
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: buffer_store_dword [[VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0: [[INNER_LOOP:.LBB[0-9]+_[0-9]+]]:			; GCN-O0: [[INNER_LOOP:.LBB[0-9]+_[0-9]+]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]			; GCN-O0: buffer_load_dword [[RESTORED_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]
				; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0: buffer_load_dword			; GCN-O0: buffer_load_dword
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
				; GCN-O0: buffer_load_dword [[RESTORED_1_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_mov_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_1_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]			; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0]]			; GCN-O0: buffer_load_dword [[RESTORED_2_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_2_VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0]]
				; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_2_VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_2_VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_2_VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_2_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[FLOW2:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[FLOW2:.LBB[0-9_]+]]
	; GCN-O0: {{^}}[[FLOW2]]:			; GCN-O0: {{^}}[[FLOW2]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_0]]			; GCN-O0: buffer_load_dword [[RESTORED_3_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_3_VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_0]]
				; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_3_VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0: s_branch [[FLOW:.LBB[0-9_]+]]			; GCN-O0: s_branch [[FLOW:.LBB[0-9_]+]]
	; GCN-O0: {{^}}[[FLOW]]:			; GCN-O0: {{^}}[[FLOW]]:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_3_VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_3_VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_3_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[FLOW3:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[FLOW3:.LBB[0-9_]+]]
	; GCN-O0: ; %bb.{{[0-9]+}}:			; GCN-O0: ; %bb.{{[0-9]+}}:
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0: buffer_load_dword [[RESTORED_4_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_4_VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_0:[0-9]+]]
				; GCN-O0-DAG: v_writelane_b32 [[RESTORED_4_VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_4_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0: {{^}}[[FLOW3]]:			; GCN-O0: {{^}}[[FLOW3]]:
				; GCN-O0: buffer_load_dword [[RESTORED_5_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:68
	; GCN-O0-COUNT-4: buffer_load_dword			; GCN-O0-COUNT-4: buffer_load_dword
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_1]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_1]]
	; GCN-O0: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]			; GCN-O0: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-COUNT-2: s_mov_b64			; GCN-O0-COUNT-2: s_mov_b64
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]
	; GCN-O0-COUNT-4: buffer_store_dword			; GCN-O0-COUNT-4: buffer_store_dword
	; GCN-O0: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]			; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]
	; GCN-O0: ; %bb.{{[0-9]+}}:			; GCN-O0: ; %bb.{{[0-9]+}}:
	; GCN-O0-COUNT-4: buffer_store_dword			; GCN-O0-COUNT-4: buffer_store_dword
	; GCN-O0: s_setpc_b64			; GCN-O0: s_setpc_b64
	;			;
	define void @scc_liveness(i32 %arg) local_unnamed_addr #0 {			define void @scc_liveness(i32 %arg) local_unnamed_addr #0 {
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VMEM -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VMEM -check-prefix=GCN %s
	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=1 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VGPR -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=1 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VGPR -check-prefix=GCN %s

	; Verify registers used for tracking exec mask changes when all			; Verify registers used for tracking exec mask changes when all
	; registers are spilled at the end of the block. The SGPR spill			; registers are spilled at the end of the block. The SGPR spill
	; placement relative to the exec modifications are important.			; placement relative to the exec modifications are important.

	; FIXME: This checks with SGPR to VGPR spilling disabled, but this may			; FIXME: This checks with SGPR to VGPR spilling disabled, but this may
	; not work correctly in cases where no workitems take a branch.			; not work correctly in cases where no workitems take a branch.


	; GCN-LABEL: {{^}}divergent_if_endif:			; GCN-LABEL: {{^}}divergent_if_endif:
	; VGPR: workitem_private_segment_byte_size = 12{{$}}			; VGPR: workitem_private_segment_byte_size = 16{{$}}


	; GCN: {{^}}; %bb.0:			; GCN: {{^}}; %bb.0:
	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	endif:			endif:
	%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]			%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]
	store i32 %tmp4, i32 addrspace(1)* %out			store i32 %tmp4, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}divergent_loop:			; GCN-LABEL: {{^}}divergent_loop:
	; VGPR: workitem_private_segment_byte_size = 16{{$}}			; VGPR: workitem_private_segment_byte_size = 20{{$}}

	; GCN: {{^}}; %bb.0:			; GCN: {{^}}; %bb.0:
	; GCN-DAG: s_mov_b32 m0, -1			; GCN-DAG: s_mov_b32 m0, -1
	; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}			; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]
	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], v0, s{{[0-9]+}}			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], v0, s{{[0-9]+}}

	; Spill load			; Spill load
	▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 25 Lines
	define float @call_split_type_used_outside_block_v2f32() #0 {			define float @call_split_type_used_outside_block_v2f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v2f32:			; GCN-LABEL: call_split_type_used_outside_block_v2f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: v_writelane_b32 v41, s33, 0			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12
	Show All 20 Lines
	define float @call_split_type_used_outside_block_v3f32() #0 {			define float @call_split_type_used_outside_block_v3f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v3f32:			; GCN-LABEL: call_split_type_used_outside_block_v3f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: v_writelane_b32 v41, s33, 0			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12
	Show All 20 Lines
	define half @call_split_type_used_outside_block_v4f16() #0 {			define half @call_split_type_used_outside_block_v4f16() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v4f16:			; GCN-LABEL: call_split_type_used_outside_block_v4f16:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: v_writelane_b32 v41, s33, 0			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12
	Show All 20 Lines
	define { i32, half } @call_split_type_used_outside_block_struct() #0 {			define { i32, half } @call_split_type_used_outside_block_struct() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_struct:			; GCN-LABEL: call_split_type_used_outside_block_struct:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: v_writelane_b32 v41, s33, 0			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12
	▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/csr-sgpr-spill-live-ins.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=gfx906 -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -mcpu=gfx906 -start-before=si-lower-sgpr-spills -stop-after=prologepilog -o - %s \| FileCheck %s

	# Make sure the modified CSR VGPRs are added as live-in to the entry			# Make sure the modified CSR VGPRs are added as live-in to the entry
	# block.			# block.

	---			---
	name: def_csr_sgpr			name: def_csr_sgpr
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	; CHECK-LABEL: name: def_csr_sgpr			; CHECK-LABEL: name: def_csr_sgpr
	; CHECK: bb.0:			; CHECK: bb.0:
	; CHECK: successors: %bb.1(0x80000000)			; CHECK-NEXT: successors: %bb.1(0x80000000)
	; CHECK: liveins: $sgpr42, $sgpr43, $sgpr46, $sgpr47, $vgpr0			; CHECK-NEXT: liveins: $sgpr42, $sgpr43, $sgpr46, $sgpr47, $vgpr0
	; CHECK: $vgpr0 = V_WRITELANE_B32 killed $sgpr42, 0, $vgpr0			; CHECK-NEXT: {{ $}}
	; CHECK: $vgpr0 = V_WRITELANE_B32 killed $sgpr43, 1, $vgpr0			; CHECK-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
				arsenmUnsubmitted Not Done Reply Inline Actions Can you precommit a change to add the -NEXTs here arsenm: Can you precommit a change to add the -NEXTs here
				cdevadasAuthorUnsubmitted Done Reply Inline Actions Will do. cdevadas: Will do.
	; CHECK: $vgpr0 = V_WRITELANE_B32 killed $sgpr46, 2, $vgpr0			; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)
	; CHECK: $vgpr0 = V_WRITELANE_B32 killed $sgpr47, 3, $vgpr0			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; CHECK: S_NOP 0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK: bb.1:			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr42, 0, killed $vgpr0
	; CHECK: liveins: $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr43, 1, killed $vgpr0
	; CHECK: $sgpr42 = S_MOV_B32 0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr46, 2, killed $vgpr0
	; CHECK: $sgpr43 = S_MOV_B32 1			; CHECK-NEXT: dead renamable $vgpr0 = V_WRITELANE_B32 $sgpr47, 3, killed $vgpr0
	; CHECK: $sgpr46_sgpr47 = S_MOV_B64 2			; CHECK-NEXT: S_NOP 0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.1:
				; CHECK-NEXT: liveins: $vgpr0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: $sgpr42 = S_MOV_B32 0
				; CHECK-NEXT: $sgpr43 = S_MOV_B32 1
				; CHECK-NEXT: $sgpr46_sgpr47 = S_MOV_B64 2
	bb.0:			bb.0:
	S_NOP 0			S_NOP 0

	bb.1:			bb.1:
	$sgpr42 = S_MOV_B32 0			$sgpr42 = S_MOV_B32 0
	$sgpr43 = S_MOV_B32 1			$sgpr43 = S_MOV_B32 1
	$sgpr46_sgpr47 = S_MOV_B64 2			$sgpr46_sgpr47 = S_MOV_B64 2
	...			...

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

	Show All 11 Lines
	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	; CHECK: .Lfunc_begin0:			; CHECK: .Lfunc_begin0:
	; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0			; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: ; implicit-def: $vgpr41
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: v_writelane_b32 v40, s34, 2
	; CHECK-NEXT: v_writelane_b32 v40, s35, 3
	; CHECK-NEXT: v_writelane_b32 v40, s36, 4
	; CHECK-NEXT: v_writelane_b32 v40, s37, 5
	; CHECK-NEXT: v_writelane_b32 v40, s38, 6
	; CHECK-NEXT: v_writelane_b32 v40, s39, 7
	; CHECK-NEXT: v_writelane_b32 v40, s40, 8
	; CHECK-NEXT: v_writelane_b32 v40, s41, 9
	; CHECK-NEXT: v_writelane_b32 v40, s42, 10
	; CHECK-NEXT: v_writelane_b32 v40, s43, 11
	; CHECK-NEXT: v_writelane_b32 v42, s33, 0			; CHECK-NEXT: v_writelane_b32 v42, s33, 0
				; CHECK-NEXT: v_writelane_b32 v41, s30, 0
				; CHECK-NEXT: v_writelane_b32 v41, s31, 1
				; CHECK-NEXT: v_writelane_b32 v41, s34, 2
				; CHECK-NEXT: v_writelane_b32 v41, s35, 3
				; CHECK-NEXT: v_writelane_b32 v41, s36, 4
				; CHECK-NEXT: v_writelane_b32 v41, s37, 5
				; CHECK-NEXT: v_writelane_b32 v41, s38, 6
				; CHECK-NEXT: v_writelane_b32 v41, s39, 7
				; CHECK-NEXT: v_writelane_b32 v41, s40, 8
				; CHECK-NEXT: v_writelane_b32 v41, s41, 9
				; CHECK-NEXT: v_writelane_b32 v41, s42, 10
				; CHECK-NEXT: v_writelane_b32 v41, s43, 11
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s44, 12			; CHECK-NEXT: v_writelane_b32 v41, s44, 12
	; CHECK-NEXT: v_writelane_b32 v40, s46, 13			; CHECK-NEXT: v_writelane_b32 v41, s46, 13
	; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]			; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]
	; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef			; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9			; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, __kmpc_alloc_shared@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, __kmpc_alloc_shared@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, __kmpc_alloc_shared@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, __kmpc_alloc_shared@gotpcrel32@hi+12
	; CHECK-NEXT: v_writelane_b32 v40, s47, 14			; CHECK-NEXT: v_writelane_b32 v41, s47, 14
	; CHECK-NEXT: s_load_dwordx2 s[46:47], s[4:5], 0x0			; CHECK-NEXT: s_load_dwordx2 s[46:47], s[4:5], 0x0
	; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]			; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v41, v31			; CHECK-NEXT: v_mov_b32_e32 v40, v31
	; CHECK-NEXT: s_mov_b32 s42, s14			; CHECK-NEXT: s_mov_b32 s42, s14
	; CHECK-NEXT: s_mov_b32 s43, s13			; CHECK-NEXT: s_mov_b32 s43, s13
	; CHECK-NEXT: s_mov_b32 s44, s12			; CHECK-NEXT: s_mov_b32 s44, s12
	; CHECK-NEXT: s_mov_b64 s[34:35], s[10:11]			; CHECK-NEXT: s_mov_b64 s[34:35], s[10:11]
	; CHECK-NEXT: s_mov_b64 s[36:37], s[8:9]			; CHECK-NEXT: s_mov_b64 s[36:37], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[38:39], s[6:7]			; CHECK-NEXT: s_mov_b64 s[38:39], s[6:7]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]
	; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]			; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]
	; CHECK-NEXT: s_mov_b64 s[6:7], s[38:39]			; CHECK-NEXT: s_mov_b64 s[6:7], s[38:39]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[36:37]			; CHECK-NEXT: s_mov_b64 s[8:9], s[36:37]
	; CHECK-NEXT: s_mov_b64 s[10:11], s[34:35]			; CHECK-NEXT: s_mov_b64 s[10:11], s[34:35]
	; CHECK-NEXT: s_mov_b32 s12, s44			; CHECK-NEXT: s_mov_b32 s12, s44
	; CHECK-NEXT: s_mov_b32 s13, s43			; CHECK-NEXT: s_mov_b32 s13, s43
	; CHECK-NEXT: s_mov_b32 s14, s42			; CHECK-NEXT: s_mov_b32 s14, s42
	; CHECK-NEXT: v_mov_b32_e32 v31, v41			; CHECK-NEXT: v_mov_b32_e32 v31, v40
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]
	; CHECK-NEXT: .Ltmp1:			; CHECK-NEXT: .Ltmp1:
	; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- [$vgpr0_vgpr1+0]			; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- [$vgpr0_vgpr1+0]
	; CHECK-NEXT: .loc 1 0 9 is_stmt 0 ; dummy:0:9			; CHECK-NEXT: .loc 1 0 9 is_stmt 0 ; dummy:0:9
	; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: v_mov_b32_e32 v2, 0			; CHECK-NEXT: v_mov_b32_e32 v2, 0
	; CHECK-NEXT: flat_store_dword v[0:1], v2			; CHECK-NEXT: flat_store_dword v[0:1], v2
	; CHECK-NEXT: v_readlane_b32 s47, v40, 14			; CHECK-NEXT: v_readlane_b32 s47, v41, 14
	; CHECK-NEXT: v_readlane_b32 s46, v40, 13			; CHECK-NEXT: v_readlane_b32 s46, v41, 13
	; CHECK-NEXT: v_readlane_b32 s44, v40, 12			; CHECK-NEXT: v_readlane_b32 s44, v41, 12
	; CHECK-NEXT: v_readlane_b32 s43, v40, 11			; CHECK-NEXT: v_readlane_b32 s43, v41, 11
	; CHECK-NEXT: v_readlane_b32 s42, v40, 10			; CHECK-NEXT: v_readlane_b32 s42, v41, 10
	; CHECK-NEXT: v_readlane_b32 s41, v40, 9			; CHECK-NEXT: v_readlane_b32 s41, v41, 9
	; CHECK-NEXT: v_readlane_b32 s40, v40, 8			; CHECK-NEXT: v_readlane_b32 s40, v41, 8
	; CHECK-NEXT: v_readlane_b32 s39, v40, 7			; CHECK-NEXT: v_readlane_b32 s39, v41, 7
	; CHECK-NEXT: v_readlane_b32 s38, v40, 6			; CHECK-NEXT: v_readlane_b32 s38, v41, 6
	; CHECK-NEXT: v_readlane_b32 s37, v40, 5			; CHECK-NEXT: v_readlane_b32 s37, v41, 5
	; CHECK-NEXT: v_readlane_b32 s36, v40, 4			; CHECK-NEXT: v_readlane_b32 s36, v41, 4
	; CHECK-NEXT: v_readlane_b32 s35, v40, 3			; CHECK-NEXT: v_readlane_b32 s35, v41, 3
	; CHECK-NEXT: v_readlane_b32 s34, v40, 2			; CHECK-NEXT: v_readlane_b32 s34, v41, 2
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v41, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v41, 0
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: v_readlane_b32 s33, v42, 0			; CHECK-NEXT: v_readlane_b32 s33, v42, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	%2 = call ptr @__kmpc_alloc_shared(), !dbg !43			%2 = call ptr @__kmpc_alloc_shared(), !dbg !43
	%3 = call ptr @__kmpc_alloc_shared()			%3 = call ptr @__kmpc_alloc_shared()
	store i32 0, ptr %3, align 4			store i32 0, ptr %3, align 4
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll

	Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines

	define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {
	; FLAT_SCR_OPT-LABEL: test:			; FLAT_SCR_OPT-LABEL: test:
	; FLAT_SCR_OPT: ; %bb.0:			; FLAT_SCR_OPT: ; %bb.0:
	; FLAT_SCR_OPT-NEXT: s_add_u32 s2, s2, s5			; FLAT_SCR_OPT-NEXT: s_add_u32 s2, s2, s5
	; FLAT_SCR_OPT-NEXT: s_addc_u32 s3, s3, 0			; FLAT_SCR_OPT-NEXT: s_addc_u32 s3, s3, 0
	; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
				; FLAT_SCR_OPT-NEXT: s_clause 0x1
	; FLAT_SCR_OPT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0			; FLAT_SCR_OPT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s104, exec_lo
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, 3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s105, 0
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v72, s105
	; FLAT_SCR_OPT-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_OPT-NEXT: v_writelane_b32 v72, s2, 0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s105, 4
	; FLAT_SCR_OPT-NEXT: v_writelane_b32 v72, s3, 1
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v72, s105 ; 4-byte Folded Spill
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s105, 0
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v72, off, s105
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s104
	; FLAT_SCR_OPT-NEXT: s_load_dword vcc_lo, s[0:1], 0x8			; FLAT_SCR_OPT-NEXT: s_load_dword vcc_lo, s[0:1], 0x8
				; FLAT_SCR_OPT-NEXT: ; implicit-def: $vgpr0
				; FLAT_SCR_OPT-NEXT: s_mov_b32 s104, 4
	; FLAT_SCR_OPT-NEXT: ; kill: killed $sgpr0_sgpr1			; FLAT_SCR_OPT-NEXT: ; kill: killed $sgpr0_sgpr1
				; FLAT_SCR_OPT-NEXT: s_waitcnt lgkmcnt(0)
				; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s2, 0
				; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s3, 1
				; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v0, s104 ; 4-byte Folded Spill
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
				; FLAT_SCR_OPT-NEXT: s_mov_b32 s2, 4
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v0, vcc_lo
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s2, exec_lo			; FLAT_SCR_OPT-NEXT: scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, 3			; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v0, vcc_lo
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 0
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v2, s3
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 4
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v2, off, s3 ; 4-byte Folded Reload
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 0
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: v_readlane_b32 s0, v2, 0
	; FLAT_SCR_OPT-NEXT: v_readlane_b32 s1, v2, 1
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v2, off, s3
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_OPT-NEXT: v_readlane_b32 s0, v1, 0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s2			; FLAT_SCR_OPT-NEXT: v_readlane_b32 s1, v1, 1
	; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v1, 0			; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v1, 0
	; FLAT_SCR_OPT-NEXT: global_store_dword v1, v0, s[0:1]			; FLAT_SCR_OPT-NEXT: global_store_dword v1, v0, s[0:1]
	; FLAT_SCR_OPT-NEXT: s_endpgm			; FLAT_SCR_OPT-NEXT: s_endpgm
	;			;
	; FLAT_SCR_ARCH-LABEL: test:			; FLAT_SCR_ARCH-LABEL: test:
	; FLAT_SCR_ARCH: ; %bb.0:			; FLAT_SCR_ARCH: ; %bb.0:
				; FLAT_SCR_ARCH-NEXT: s_clause 0x1
	; FLAT_SCR_ARCH-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0			; FLAT_SCR_ARCH-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s104, exec_lo
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, 3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s105, 0
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v72, s105
	; FLAT_SCR_ARCH-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v72, s2, 0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s105, 4
	; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v72, s3, 1
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v72, s105 ; 4-byte Folded Spill
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s105, 0
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v72, off, s105
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s104
	; FLAT_SCR_ARCH-NEXT: s_load_dword vcc_lo, s[0:1], 0x8			; FLAT_SCR_ARCH-NEXT: s_load_dword vcc_lo, s[0:1], 0x8
				; FLAT_SCR_ARCH-NEXT: ; implicit-def: $vgpr0
				; FLAT_SCR_ARCH-NEXT: s_mov_b32 s104, 4
	; FLAT_SCR_ARCH-NEXT: ; kill: killed $sgpr0_sgpr1			; FLAT_SCR_ARCH-NEXT: ; kill: killed $sgpr0_sgpr1
				; FLAT_SCR_ARCH-NEXT: s_waitcnt lgkmcnt(0)
				; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s2, 0
				; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s3, 1
				; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v0, s104 ; 4-byte Folded Spill
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
				; FLAT_SCR_ARCH-NEXT: s_mov_b32 s2, 4
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v0, vcc_lo
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s2, exec_lo			; FLAT_SCR_ARCH-NEXT: scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, 3			; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v0, vcc_lo
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 0
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v2, s3
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 4
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v2, off, s3 ; 4-byte Folded Reload
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 0
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s0, v2, 0
	; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s1, v2, 1
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v2, off, s3
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s0, v1, 0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s2			; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s1, v1, 1
	; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v1, 0			; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v1, 0
	; FLAT_SCR_ARCH-NEXT: global_store_dword v1, v0, s[0:1]			; FLAT_SCR_ARCH-NEXT: global_store_dword v1, v0, s[0:1]
	; FLAT_SCR_ARCH-NEXT: s_endpgm			; FLAT_SCR_ARCH-NEXT: s_endpgm
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	call void asm sideeffect "", "~{s[8:15]}" ()			call void asm sideeffect "", "~{s[8:15]}" ()
	call void asm sideeffect "", "~{s[16:23]}" ()			call void asm sideeffect "", "~{s[16:23]}" ()
	call void asm sideeffect "", "~{s[24:31]}" ()			call void asm sideeffect "", "~{s[24:31]}" ()
	call void asm sideeffect "", "~{s[32:39]}" ()			call void asm sideeffect "", "~{s[32:39]}" ()
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fold-reload-into-exec.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s

	# Test that a spill of a copy of exec is not folded to be a spill of exec directly.			# Test that a spill of a copy of exec is not folded to be a spill of exec directly.

	---			---

	name: merge_sgpr_spill_into_copy_from_exec_lo			name: merge_sgpr_spill_into_copy_from_exec_lo
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_lo			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_lo
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $exec_lo
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec_lo
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_lo			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_lo
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec_lo			S_NOP 0, implicit-def $exec_lo
	%0:sreg_32 = COPY $exec_lo			%0:sreg_32 = COPY $exec_lo
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$exec_lo = COPY %0			$exec_lo = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: merge_sgpr_spill_into_copy_from_exec_hi			name: merge_sgpr_spill_into_copy_from_exec_hi
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_hi			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_hi
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $exec_hi
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec_hi
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_hi			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_hi
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec_hi			S_NOP 0, implicit-def $exec_hi
	%0:sreg_32 = COPY $exec_hi			%0:sreg_32 = COPY $exec_hi
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$exec_hi = COPY %0			$exec_hi = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: merge_sgpr_spill_into_copy_from_exec			name: merge_sgpr_spill_into_copy_from_exec
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $exec
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec
	; CHECK-NEXT: $sgpr0_sgpr1 = S_MOV_B64 $exec			; CHECK-NEXT: $sgpr0_sgpr1 = S_MOV_B64 $exec
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1, implicit killed renamable $sgpr0_sgpr1			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1, implicit killed renamable $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 killed $vgpr0, 1
	; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec			S_NOP 0, implicit-def $exec
	%0:sreg_64 = COPY $exec			%0:sreg_64 = COPY $exec
	S_NOP 0, implicit-def %1:sreg_64, implicit-def %2:sreg_64, implicit %0			S_NOP 0, implicit-def %1:sreg_64, implicit-def %2:sreg_64, implicit %0
	$exec = COPY %0			$exec = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

	# Test that a reload into a copy of exec is not folded to be a reload of exec directly.			# Test that a reload into a copy of exec is not folded to be a reload of exec directly.

	---			---

	name: reload_sgpr_spill_into_copy_to_exec_lo			name: reload_sgpr_spill_into_copy_to_exec_lo
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_lo			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_lo
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_lo
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_lo			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_lo			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_lo
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$exec_lo = COPY %0			$exec_lo = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: reload_sgpr_spill_into_copy_to_exec_hi			name: reload_sgpr_spill_into_copy_to_exec_hi
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_hi			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_hi
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_hi
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_hi			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_hi			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_hi
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$exec_hi = COPY %0			$exec_hi = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: reload_sgpr_spill_into_copy_to_exec			name: reload_sgpr_spill_into_copy_to_exec
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def $exec
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def $exec			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 killed $vgpr0, 1
	; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_64, implicit-def %1:sreg_64, implicit-def $exec			S_NOP 0, implicit-def %0:sreg_64, implicit-def %1:sreg_64, implicit-def $exec
	S_NOP 0, implicit %0, implicit-def %3:sreg_64, implicit-def %4:sreg_64			S_NOP 0, implicit %0, implicit-def %3:sreg_64, implicit-def %4:sreg_64
	$exec = COPY %0			$exec = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

llvm/test/CodeGen/AMDGPU/fold-reload-into-m0.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s

	# Test that a spill of a copy of m0 is not folded to be a spill of m0 directly.			# Test that a spill of a copy of m0 is not folded to be a spill of m0 directly.

	---			---

	name: merge_sgpr_spill_into_copy_from_m0			name: merge_sgpr_spill_into_copy_from_m0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:

	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_m0			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_m0
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $m0
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $m0
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $m0			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $m0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_NOP 0			; CHECK-NEXT: S_NOP 0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $m0			S_NOP 0, implicit-def $m0
	%0:sreg_32 = COPY $m0			%0:sreg_32 = COPY $m0
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$m0 = COPY %0			$m0 = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

	# Test that a reload into a copy of m0 is not folded to be a reload of m0 directly.			# Test that a reload into a copy of m0 is not folded to be a reload of m0 directly.

	---			---

	name: reload_sgpr_spill_into_copy_to_m0			name: reload_sgpr_spill_into_copy_to_m0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:

	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_m0			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_m0
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $m0			; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $m0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_NOP 0			; CHECK-NEXT: S_NOP 0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $m0			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $m0
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$m0 = COPY %0			$m0 = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s

	; Check frame setup where SGPR spills to VGPRs are disabled or enabled.			; Check frame setup where SGPR spills to VGPRs are disabled or enabled.

	declare hidden void @external_void_func_void() #0			declare hidden void @external_void_func_void() #0

	define void @callee_with_stack_and_call() #0 {			define void @callee_with_stack_and_call() #0 {
	; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; SPILL-TO-VGPR: ; %bb.0:			; SPILL-TO-VGPR: ; %bb.0:
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
				; SPILL-TO-VGPR-NEXT: ; implicit-def: $vgpr40
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v41, s33, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v41, s33, 0
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0
	; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0			; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33			; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s
	; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s			; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s

	declare void @extern_c_func()			declare void @extern_c_func()

	define amdgpu_gfx void @gfx_func() {			define amdgpu_gfx void @gfx_func() {
	; SDAG-LABEL: gfx_func:			; SDAG-LABEL: gfx_func:
	; SDAG: ; %bb.0:			; SDAG: ; %bb.0:
	; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
				; SDAG-NEXT: ; implicit-def: $vgpr40
				; SDAG-NEXT: s_mov_b32 s36, s33
	; SDAG-NEXT: v_writelane_b32 v40, s4, 0			; SDAG-NEXT: v_writelane_b32 v40, s4, 0
	; SDAG-NEXT: v_writelane_b32 v40, s5, 1			; SDAG-NEXT: v_writelane_b32 v40, s5, 1
	; SDAG-NEXT: v_writelane_b32 v40, s6, 2			; SDAG-NEXT: v_writelane_b32 v40, s6, 2
	; SDAG-NEXT: v_writelane_b32 v40, s7, 3			; SDAG-NEXT: v_writelane_b32 v40, s7, 3
	; SDAG-NEXT: v_writelane_b32 v40, s8, 4			; SDAG-NEXT: v_writelane_b32 v40, s8, 4
	; SDAG-NEXT: v_writelane_b32 v40, s9, 5			; SDAG-NEXT: v_writelane_b32 v40, s9, 5
	; SDAG-NEXT: v_writelane_b32 v40, s10, 6			; SDAG-NEXT: v_writelane_b32 v40, s10, 6
	; SDAG-NEXT: v_writelane_b32 v40, s11, 7			; SDAG-NEXT: v_writelane_b32 v40, s11, 7
	; SDAG-NEXT: v_writelane_b32 v40, s12, 8			; SDAG-NEXT: v_writelane_b32 v40, s12, 8
	; SDAG-NEXT: v_writelane_b32 v40, s13, 9			; SDAG-NEXT: v_writelane_b32 v40, s13, 9
	; SDAG-NEXT: v_writelane_b32 v40, s14, 10			; SDAG-NEXT: v_writelane_b32 v40, s14, 10
	; SDAG-NEXT: v_writelane_b32 v40, s15, 11			; SDAG-NEXT: v_writelane_b32 v40, s15, 11
	; SDAG-NEXT: v_writelane_b32 v40, s16, 12			; SDAG-NEXT: v_writelane_b32 v40, s16, 12
	; SDAG-NEXT: v_writelane_b32 v40, s17, 13			; SDAG-NEXT: v_writelane_b32 v40, s17, 13
	; SDAG-NEXT: v_writelane_b32 v40, s18, 14			; SDAG-NEXT: v_writelane_b32 v40, s18, 14
	; SDAG-NEXT: v_writelane_b32 v40, s19, 15			; SDAG-NEXT: v_writelane_b32 v40, s19, 15
	; SDAG-NEXT: v_writelane_b32 v40, s20, 16			; SDAG-NEXT: v_writelane_b32 v40, s20, 16
	; SDAG-NEXT: v_writelane_b32 v40, s21, 17			; SDAG-NEXT: v_writelane_b32 v40, s21, 17
	; SDAG-NEXT: v_writelane_b32 v40, s22, 18			; SDAG-NEXT: v_writelane_b32 v40, s22, 18
	; SDAG-NEXT: v_writelane_b32 v40, s23, 19			; SDAG-NEXT: v_writelane_b32 v40, s23, 19
	; SDAG-NEXT: s_mov_b32 s36, s33
	; SDAG-NEXT: s_mov_b32 s33, s32			; SDAG-NEXT: s_mov_b32 s33, s32
	; SDAG-NEXT: s_addk_i32 s32, 0x400			; SDAG-NEXT: s_addk_i32 s32, 0x400
	; SDAG-NEXT: v_writelane_b32 v40, s24, 20			; SDAG-NEXT: v_writelane_b32 v40, s24, 20
	; SDAG-NEXT: v_writelane_b32 v40, s25, 21			; SDAG-NEXT: v_writelane_b32 v40, s25, 21
	; SDAG-NEXT: s_getpc_b64 s[34:35]			; SDAG-NEXT: s_getpc_b64 s[34:35]
	; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; SDAG-NEXT: v_writelane_b32 v40, s26, 22			; SDAG-NEXT: v_writelane_b32 v40, s26, 22
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; SDAG-NEXT: s_setpc_b64 s[30:31]			; SDAG-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: gfx_func:			; GISEL-LABEL: gfx_func:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
				; GISEL-NEXT: ; implicit-def: $vgpr40
				; GISEL-NEXT: s_mov_b32 s36, s33
	; GISEL-NEXT: v_writelane_b32 v40, s4, 0			; GISEL-NEXT: v_writelane_b32 v40, s4, 0
	; GISEL-NEXT: v_writelane_b32 v40, s5, 1			; GISEL-NEXT: v_writelane_b32 v40, s5, 1
	; GISEL-NEXT: v_writelane_b32 v40, s6, 2			; GISEL-NEXT: v_writelane_b32 v40, s6, 2
	; GISEL-NEXT: v_writelane_b32 v40, s7, 3			; GISEL-NEXT: v_writelane_b32 v40, s7, 3
	; GISEL-NEXT: v_writelane_b32 v40, s8, 4			; GISEL-NEXT: v_writelane_b32 v40, s8, 4
	; GISEL-NEXT: v_writelane_b32 v40, s9, 5			; GISEL-NEXT: v_writelane_b32 v40, s9, 5
	; GISEL-NEXT: v_writelane_b32 v40, s10, 6			; GISEL-NEXT: v_writelane_b32 v40, s10, 6
	; GISEL-NEXT: v_writelane_b32 v40, s11, 7			; GISEL-NEXT: v_writelane_b32 v40, s11, 7
	; GISEL-NEXT: v_writelane_b32 v40, s12, 8			; GISEL-NEXT: v_writelane_b32 v40, s12, 8
	; GISEL-NEXT: v_writelane_b32 v40, s13, 9			; GISEL-NEXT: v_writelane_b32 v40, s13, 9
	; GISEL-NEXT: v_writelane_b32 v40, s14, 10			; GISEL-NEXT: v_writelane_b32 v40, s14, 10
	; GISEL-NEXT: v_writelane_b32 v40, s15, 11			; GISEL-NEXT: v_writelane_b32 v40, s15, 11
	; GISEL-NEXT: v_writelane_b32 v40, s16, 12			; GISEL-NEXT: v_writelane_b32 v40, s16, 12
	; GISEL-NEXT: v_writelane_b32 v40, s17, 13			; GISEL-NEXT: v_writelane_b32 v40, s17, 13
	; GISEL-NEXT: v_writelane_b32 v40, s18, 14			; GISEL-NEXT: v_writelane_b32 v40, s18, 14
	; GISEL-NEXT: v_writelane_b32 v40, s19, 15			; GISEL-NEXT: v_writelane_b32 v40, s19, 15
	; GISEL-NEXT: v_writelane_b32 v40, s20, 16			; GISEL-NEXT: v_writelane_b32 v40, s20, 16
	; GISEL-NEXT: v_writelane_b32 v40, s21, 17			; GISEL-NEXT: v_writelane_b32 v40, s21, 17
	; GISEL-NEXT: v_writelane_b32 v40, s22, 18			; GISEL-NEXT: v_writelane_b32 v40, s22, 18
	; GISEL-NEXT: v_writelane_b32 v40, s23, 19			; GISEL-NEXT: v_writelane_b32 v40, s23, 19
	; GISEL-NEXT: s_mov_b32 s36, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s24, 20			; GISEL-NEXT: v_writelane_b32 v40, s24, 20
	; GISEL-NEXT: v_writelane_b32 v40, s25, 21			; GISEL-NEXT: v_writelane_b32 v40, s25, 21
	; GISEL-NEXT: s_getpc_b64 s[34:35]			; GISEL-NEXT: s_getpc_b64 s[34:35]
	; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; GISEL-NEXT: v_writelane_b32 v40, s26, 22			; GISEL-NEXT: v_writelane_b32 v40, s26, 22
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_i1_imm:		; GFX9-LABEL: test_call_external_void_func_i1_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
Show All 12 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_and_b32_e32 v0, 1, v0		; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v40, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v40, 0
; GFX9-NEXT: s_addk_i32 s32, 0xfc00		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: v_readlane_b32 s33, v41, 0		; GFX9-NEXT: v_readlane_b32 s33, v41, 0
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
Show All 9 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 11 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 14 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_and_b32_e32 v0, 1, v0		; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v40, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v40, 0
; GFX9-NEXT: s_addk_i32 s32, 0xfc00		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: v_readlane_b32 s33, v41, 0		; GFX9-NEXT: v_readlane_b32 s33, v41, 0
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
Show All 9 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 11 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 12 Lines
define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_i8_imm:		; GFX9-LABEL: test_call_external_void_func_i8_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
Show All 11 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 15 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 15 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_i16_imm:		; GFX9-LABEL: test_call_external_void_func_i16_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
Show All 11 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc		; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 15 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc		; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 15 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_i32_imm:		; GFX9-LABEL: test_call_external_void_func_i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 42		; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 42		; GFX10-NEXT: v_mov_b32_e32 v0, 42
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
Show All 9 Lines
define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_i64_imm:		; GFX9-LABEL: test_call_external_void_func_i64_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
Show All 15 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
Show All 20 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
Show All 9 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
Show All 10 Lines
define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v2i64_imm:		; GFX9-LABEL: test_call_external_void_func_v2i64_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
Show All 17 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 22 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v4, 1		; GFX9-NEXT: v_mov_b32_e32 v4, 1
; GFX9-NEXT: v_mov_b32_e32 v5, 2		; GFX9-NEXT: v_mov_b32_e32 v5, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 1		; GFX10-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v5, 2		; GFX10-NEXT: v_mov_b32_e32 v5, 2
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
Show All 18 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
Show All 24 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v4, 1		; GFX9-NEXT: v_mov_b32_e32 v4, 1
; GFX9-NEXT: v_mov_b32_e32 v5, 2		; GFX9-NEXT: v_mov_b32_e32 v5, 2
; GFX9-NEXT: v_mov_b32_e32 v6, 3		; GFX9-NEXT: v_mov_b32_e32 v6, 3
; GFX9-NEXT: v_mov_b32_e32 v7, 4		; GFX9-NEXT: v_mov_b32_e32 v7, 4
Show All 19 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 1		; GFX10-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v5, 2		; GFX10-NEXT: v_mov_b32_e32 v5, 2
; GFX10-NEXT: v_mov_b32_e32 v6, 3		; GFX10-NEXT: v_mov_b32_e32 v6, 3
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: v_mov_b32_e32 v7, 4		; GFX10-NEXT: v_mov_b32_e32 v7, 4
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
Show All 22 Lines
define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_f16_imm:		; GFX9-LABEL: test_call_external_void_func_f16_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400		; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400		; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
Show All 9 Lines
define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_f32_imm:		; GFX9-LABEL: test_call_external_void_func_f32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 4.0		; GFX9-NEXT: v_mov_b32_e32 v0, 4.0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 4.0		; GFX10-NEXT: v_mov_b32_e32 v0, 4.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
Show All 9 Lines
define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v2f32_imm:		; GFX9-LABEL: test_call_external_void_func_v2f32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1.0		; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
Show All 15 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
Show All 17 Lines
define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v3f32_imm:		; GFX9-LABEL: test_call_external_void_func_v3f32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1.0		; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 4.0		; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
Show All 16 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
Show All 18 Lines
define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v5f32_imm:		; GFX9-LABEL: test_call_external_void_func_v5f32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1.0		; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 4.0		; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
; GFX9-NEXT: v_mov_b32_e32 v3, -1.0		; GFX9-NEXT: v_mov_b32_e32 v3, -1.0
Show All 18 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-NEXT: v_mov_b32_e32 v3, -1.0		; GFX10-NEXT: v_mov_b32_e32 v3, -1.0
; GFX10-NEXT: v_mov_b32_e32 v4, 0.5		; GFX10-NEXT: v_mov_b32_e32 v4, 0.5
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
Show All 18 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
Show All 20 Lines
define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_f64_imm:		; GFX9-LABEL: test_call_external_void_func_f64_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000		; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000		; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
Show All 15 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
Show All 17 Lines
define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v2f64_imm:		; GFX9-LABEL: test_call_external_void_func_v2f64_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
Show All 17 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 19 Lines
define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v3f64_imm:		; GFX9-LABEL: test_call_external_void_func_v3f64_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
Show All 19 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-NEXT: v_mov_b32_e32 v4, 0		; GFX10-NEXT: v_mov_b32_e32 v4, 0
; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000		; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
Show All 19 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
Show All 22 Lines
; GFX9-LABEL: test_call_external_void_func_v2i16:		; GFX9-LABEL: test_call_external_void_func_v2i16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_dword v0, v[0:1], off		; GFX9-NEXT: global_load_dword v0, v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
Show All 14 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_dword v0, v[0:1], off		; GFX10-NEXT: global_load_dword v0, v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 14 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX9-LABEL: test_call_external_void_func_v3i16:		; GFX9-LABEL: test_call_external_void_func_v3i16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
Show All 14 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 14 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX9-LABEL: test_call_external_void_func_v3f16:		; GFX9-LABEL: test_call_external_void_func_v3f16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
Show All 14 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 14 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v3i16_imm:		; GFX9-LABEL: test_call_external_void_func_v3i16_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
; GFX9-NEXT: v_mov_b32_e32 v1, 3		; GFX9-NEXT: v_mov_b32_e32 v1, 3
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 3		; GFX10-NEXT: v_mov_b32_e32 v1, 3
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
Show All 15 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
Show All 17 Lines
define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v3f16_imm:		; GFX9-LABEL: test_call_external_void_func_v3f16_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00		; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00
; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400		; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00		; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400		; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
Show All 15 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
Show All 18 Lines
; GFX9-LABEL: test_call_external_void_func_v4i16:		; GFX9-LABEL: test_call_external_void_func_v4i16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
Show All 14 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 14 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v4i16_imm:		; GFX9-LABEL: test_call_external_void_func_v4i16_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003		; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003		; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
Show All 15 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
Show All 18 Lines
; GFX9-LABEL: test_call_external_void_func_v2f16:		; GFX9-LABEL: test_call_external_void_func_v2f16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_dword v0, v[0:1], off		; GFX9-NEXT: global_load_dword v0, v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
Show All 14 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_dword v0, v[0:1], off		; GFX10-NEXT: global_load_dword v0, v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 14 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX9-LABEL: test_call_external_void_func_v2i32:		; GFX9-LABEL: test_call_external_void_func_v2i32:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
Show All 14 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 14 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v2i32_imm:		; GFX9-LABEL: test_call_external_void_func_v2i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
Show All 15 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
Show All 17 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_v3i32_imm:		; GFX9-LABEL: test_call_external_void_func_v3i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
; GFX9-NEXT: v_mov_b32_e32 v1, 4		; GFX9-NEXT: v_mov_b32_e32 v1, 4
; GFX9-NEXT: v_mov_b32_e32 v2, 5		; GFX9-NEXT: v_mov_b32_e32 v2, 5
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 4		; GFX10-NEXT: v_mov_b32_e32 v1, 4
; GFX10-NEXT: v_mov_b32_e32 v2, 5		; GFX10-NEXT: v_mov_b32_e32 v2, 5
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
Show All 16 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
Show All 18 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_v3i32_i32:		; GFX9-LABEL: test_call_external_void_func_v3i32_i32:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
; GFX9-NEXT: v_mov_b32_e32 v1, 4		; GFX9-NEXT: v_mov_b32_e32 v1, 4
; GFX9-NEXT: v_mov_b32_e32 v2, 5		; GFX9-NEXT: v_mov_b32_e32 v2, 5
; GFX9-NEXT: v_mov_b32_e32 v3, 6		; GFX9-NEXT: v_mov_b32_e32 v3, 6
Show All 17 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 4		; GFX10-NEXT: v_mov_b32_e32 v1, 4
; GFX10-NEXT: v_mov_b32_e32 v2, 5		; GFX10-NEXT: v_mov_b32_e32 v2, 5
; GFX10-NEXT: v_mov_b32_e32 v3, 6		; GFX10-NEXT: v_mov_b32_e32 v3, 6
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 20 Lines
; GFX9-LABEL: test_call_external_void_func_v4i32:		; GFX9-LABEL: test_call_external_void_func_v4i32:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
Show All 14 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 14 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v4i32_imm:		; GFX9-LABEL: test_call_external_void_func_v4i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
Show All 17 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 19 Lines
define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v5i32_imm:		; GFX9-LABEL: test_call_external_void_func_v5i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
Show All 18 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: v_mov_b32_e32 v4, 5		; GFX10-NEXT: v_mov_b32_e32 v4, 5
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
Show All 18 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v8, 0		; GFX9-NEXT: v_mov_b32_e32 v8, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v8, 0		; GFX10-NEXT: v_mov_b32_e32 v8, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 19 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 21 Lines
define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {		define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {
; GFX9-LABEL: test_call_external_void_func_v8i32_imm:		; GFX9-LABEL: test_call_external_void_func_v8i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
Show All 21 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: v_mov_b32_e32 v4, 5		; GFX10-NEXT: v_mov_b32_e32 v4, 5
; GFX10-NEXT: v_mov_b32_e32 v5, 6		; GFX10-NEXT: v_mov_b32_e32 v5, 6
; GFX10-NEXT: v_mov_b32_e32 v6, 7		; GFX10-NEXT: v_mov_b32_e32 v6, 7
; GFX10-NEXT: v_mov_b32_e32 v7, 8		; GFX10-NEXT: v_mov_b32_e32 v7, 8
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
Show All 21 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
Show All 25 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v16, 0		; GFX9-NEXT: v_mov_b32_e32 v16, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32		; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48		; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
Show All 20 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v16, 0		; GFX10-NEXT: v_mov_b32_e32 v16, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x3		; GFX10-NEXT: s_clause 0x3
; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32		; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48		; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
Show All 21 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x3		; GFX10-SCRATCH-NEXT: s_clause 0x3
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48
Show All 25 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v28, 0		; GFX9-NEXT: v_mov_b32_e32 v28, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32		; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48		; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
Show All 25 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v32, 0		; GFX10-NEXT: v_mov_b32_e32 v32, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x7		; GFX10-NEXT: s_clause 0x7
; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32		; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48		; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
Show All 25 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x7		; GFX10-SCRATCH-NEXT: s_clause 0x7
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
Show All 30 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v28, 0		; GFX9-NEXT: v_mov_b32_e32 v28, 0
; GFX9-NEXT: global_load_dword v32, v[0:1], off		; GFX9-NEXT: global_load_dword v32, v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32		; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48		; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64		; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
Show All 27 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v32, 0		; GFX10-NEXT: v_mov_b32_e32 v32, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: global_load_dword v33, v[0:1], off		; GFX10-NEXT: global_load_dword v33, v[0:1], off
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x7		; GFX10-NEXT: s_clause 0x7
; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32		; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
Show All 28 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x7		; GFX10-SCRATCH-NEXT: s_clause 0x7
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
Show All 28 Lines	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
ret void		ret void
}		}

define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_external_i32_func_i32_imm:		; GFX9-LABEL: test_call_external_i32_func_i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v43, s33, 0		; GFX9-NEXT: v_writelane_b32 v43, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
		; GFX9-NEXT: ; implicit-def: $vgpr42
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v42, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v41, v0		; GFX9-NEXT: v_mov_b32_e32 v40, v0
; GFX9-NEXT: v_mov_b32_e32 v0, 42		; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v42, s31, 1
; GFX9-NEXT: v_mov_b32_e32 v42, v1		; GFX9-NEXT: v_mov_b32_e32 v41, v1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: global_store_dword v[41:42], v0, off		; GFX9-NEXT: global_store_dword v[40:41], v0, off
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v42, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v42, 0
; GFX9-NEXT: s_addk_i32 s32, 0xf800		; GFX9-NEXT: s_addk_i32 s32, 0xf800
; GFX9-NEXT: v_readlane_b32 s33, v43, 0		; GFX9-NEXT: v_readlane_b32 s33, v43, 0
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: test_call_external_i32_func_i32_imm:		; GFX10-LABEL: test_call_external_i32_func_i32_imm:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v43, s33, 0		; GFX10-NEXT: v_writelane_b32 v43, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: ; implicit-def: $vgpr42
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: v_mov_b32_e32 v41, v0		; GFX10-NEXT: v_writelane_b32 v42, s30, 0
		; GFX10-NEXT: v_mov_b32_e32 v40, v0
; GFX10-NEXT: v_mov_b32_e32 v0, 42		; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
; GFX10-NEXT: v_mov_b32_e32 v42, v1		; GFX10-NEXT: v_mov_b32_e32 v41, v1
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v42, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: global_store_dword v[41:42], v0, off		; GFX10-NEXT: global_store_dword v[40:41], v0, off
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v42, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v42, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfc00		; GFX10-NEXT: s_addk_i32 s32, 0xfc00
; GFX10-NEXT: v_readlane_b32 s33, v43, 0		; GFX10-NEXT: v_readlane_b32 s33, v43, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12		; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s32 offset:8 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v43, s32 offset:12 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v43, s32 offset:12 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v43, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v43, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr42
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s30, 0
		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v40, v0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off		; GFX10-SCRATCH-NEXT: global_store_dword v[40:41], v0, off
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:4
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v42, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v42, 0
; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0		; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v43, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v43, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8		; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s32 offset:8
; GFX10-SCRATCH-NEXT: scratch_load_dword v43, off, s32 offset:12		; GFX10-SCRATCH-NEXT: scratch_load_dword v43, off, s32 offset:12
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]		; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)		%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
store volatile i32 %val, i32 addrspace(1)* %out		store volatile i32 %val, i32 addrspace(1)* %out
ret void		ret void
}		}

define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {		define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {
; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:		; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]		; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]
; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4		; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]		; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]
; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4		; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 19 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4		; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 24 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX9-NEXT: v_mov_b32_e32 v0, 8		; GFX9-NEXT: v_mov_b32_e32 v0, 8
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33		; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 19 Lines
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
; GFX10-NEXT: v_mov_b32_e32 v1, 8		; GFX10-NEXT: v_mov_b32_e32 v1, 8
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x400
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33		; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
; GFX10-NEXT: s_addk_i32 s32, 0x400
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfc00		; GFX10-NEXT: s_addk_i32 s32, 0xfc00
Show All 15 Lines
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0		; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
Show All 23 Lines
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX9-NEXT: v_mov_b32_e32 v0, 8		; GFX9-NEXT: v_mov_b32_e32 v0, 8
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33		; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_add_u32_e32 v0, 8, v0		; GFX9-NEXT: v_add_u32_e32 v0, 8, v0
; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33		; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 26 Lines
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
; GFX10-NEXT: v_mov_b32_e32 v1, 8		; GFX10-NEXT: v_mov_b32_e32 v1, 8
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x400
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33		; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33		; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33
; GFX10-NEXT: s_addk_i32 s32, 0x400
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0		; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8		; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12		; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfc00		; GFX10-NEXT: s_addk_i32 s32, 0xfc00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
Show All 24 Lines
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8		; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8		; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8
; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12		; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 35 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 36 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
Show All 37 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: s_mov_b32 s6, s33		; GFX9-NEXT: s_mov_b32 s6, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20		; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16		; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33		; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: v_writelane_b32 v40, s34, 2		; GFX9-NEXT: v_writelane_b32 v40, s34, 2
; GFX9-NEXT: v_writelane_b32 v40, s35, 3		; GFX9-NEXT: v_writelane_b32 v40, s35, 3
; GFX9-NEXT: v_writelane_b32 v40, s36, 4		; GFX9-NEXT: v_writelane_b32 v40, s36, 4
; GFX9-NEXT: v_writelane_b32 v40, s37, 5		; GFX9-NEXT: v_writelane_b32 v40, s37, 5
; GFX9-NEXT: v_writelane_b32 v40, s38, 6		; GFX9-NEXT: v_writelane_b32 v40, s38, 6
; GFX9-NEXT: v_writelane_b32 v40, s39, 7		; GFX9-NEXT: v_writelane_b32 v40, s39, 7
Show All 14 Lines
; GFX9-NEXT: v_writelane_b32 v40, s54, 22		; GFX9-NEXT: v_writelane_b32 v40, s54, 22
; GFX9-NEXT: v_writelane_b32 v40, s55, 23		; GFX9-NEXT: v_writelane_b32 v40, s55, 23
; GFX9-NEXT: v_writelane_b32 v40, s56, 24		; GFX9-NEXT: v_writelane_b32 v40, s56, 24
; GFX9-NEXT: v_writelane_b32 v40, s57, 25		; GFX9-NEXT: v_writelane_b32 v40, s57, 25
; GFX9-NEXT: v_writelane_b32 v40, s58, 26		; GFX9-NEXT: v_writelane_b32 v40, s58, 26
; GFX9-NEXT: v_writelane_b32 v40, s59, 27		; GFX9-NEXT: v_writelane_b32 v40, s59, 27
; GFX9-NEXT: v_writelane_b32 v40, s60, 28		; GFX9-NEXT: v_writelane_b32 v40, s60, 28
; GFX9-NEXT: v_writelane_b32 v40, s61, 29		; GFX9-NEXT: v_writelane_b32 v40, s61, 29
; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s62, 30		; GFX9-NEXT: v_writelane_b32 v40, s62, 30
; GFX9-NEXT: v_writelane_b32 v40, s63, 31		; GFX9-NEXT: v_writelane_b32 v40, s63, 31
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12
; GFX9-NEXT: s_waitcnt vmcnt(2)		; GFX9-NEXT: s_waitcnt vmcnt(2)
; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4		; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4
; GFX9-NEXT: s_waitcnt vmcnt(2)		; GFX9-NEXT: s_waitcnt vmcnt(2)
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: s_mov_b32 s6, s33		; GFX10-NEXT: s_mov_b32 s6, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_clause 0x2		; GFX10-NEXT: s_clause 0x2
; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20		; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16		; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
; GFX10-NEXT: buffer_load_dword v31, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v31, off, s[0:3], s33
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12
; GFX10-NEXT: s_waitcnt vmcnt(2)		; GFX10-NEXT: s_waitcnt vmcnt(2)
; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt vmcnt(1)		; GFX10-NEXT: s_waitcnt vmcnt(1)
; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:24 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:24 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33 offset:16		; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33 offset:16
; GFX10-SCRATCH-NEXT: scratch_load_dword v31, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v31, off, s33
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, byval_align16_f64_arg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, byval_align16_f64_arg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, byval_align16_f64_arg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, byval_align16_f64_arg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s34, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s34, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s35, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s35, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 5
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
Show All 15 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
Show All 10 Lines
define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_movk_i32 s4, 0x7b		; GFX9-NEXT: s_movk_i32 s4, 0x7b
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x7b		; GFX10-NEXT: s_movk_i32 s4, 0x7b
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 9 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 11 Lines
define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_movk_i32 s4, 0x7b		; GFX9-NEXT: s_movk_i32 s4, 0x7b
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x7b		; GFX10-NEXT: s_movk_i32 s4, 0x7b
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 9 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 11 Lines
define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_mov_b32 s4, 42		; GFX9-NEXT: s_mov_b32 s4, 42
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 42		; GFX10-NEXT: s_mov_b32 s4, 42
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 9 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 11 Lines
define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_movk_i32 s4, 0x7b		; GFX9-NEXT: s_movk_i32 s4, 0x7b
; GFX9-NEXT: s_mov_b32 s5, 0		; GFX9-NEXT: s_mov_b32 s5, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x7b		; GFX10-NEXT: s_movk_i32 s4, 0x7b
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
Show All 11 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
Show All 13 Lines
define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:		; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 19 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_mov_b64 s[34:35], 0		; GFX10-NEXT: s_mov_b64 s[34:35], 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 23 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0		; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 26 Lines
define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
Show All 20 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, 4		; GFX10-NEXT: s_mov_b32 s7, 4
; GFX10-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-NEXT: v_readlane_b32 s31, v40, 5
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
Show All 19 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:		; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 6		; GFX9-NEXT: v_writelane_b32 v40, s30, 6
Show All 25 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_mov_b64 s[34:35], 0		; GFX10-NEXT: s_mov_b64 s[34:35], 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 29 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0		; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 34 Lines
define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:		; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
Show All 30 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_mov_b64 s[34:35], 0		; GFX10-NEXT: s_mov_b64 s[34:35], 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 35 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0		; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 39 Lines
define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_movk_i32 s4, 0x4400		; GFX9-NEXT: s_movk_i32 s4, 0x4400
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x4400		; GFX10-NEXT: s_movk_i32 s4, 0x4400
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 9 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 11 Lines
define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_mov_b32 s4, 4.0		; GFX9-NEXT: s_mov_b32 s4, 4.0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 4.0		; GFX10-NEXT: s_mov_b32 s4, 4.0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 9 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 11 Lines
define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 1.0		; GFX9-NEXT: s_mov_b32 s4, 1.0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1.0		; GFX10-NEXT: s_mov_b32 s4, 1.0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
Show All 11 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
Show All 13 Lines
define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 3		; GFX9-NEXT: v_writelane_b32 v40, s30, 3
; GFX9-NEXT: s_mov_b32 s4, 1.0		; GFX9-NEXT: s_mov_b32 s4, 1.0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 4.0		; GFX9-NEXT: s_mov_b32 s6, 4.0
; GFX9-NEXT: v_writelane_b32 v40, s31, 4		; GFX9-NEXT: v_writelane_b32 v40, s31, 4
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 18 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1.0		; GFX10-NEXT: s_mov_b32 s4, 1.0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 4.0		; GFX10-NEXT: s_mov_b32 s6, 4.0
; GFX10-NEXT: v_writelane_b32 v40, s30, 3		; GFX10-NEXT: v_writelane_b32 v40, s30, 3
; GFX10-NEXT: v_writelane_b32 v40, s31, 4		; GFX10-NEXT: v_writelane_b32 v40, s31, 4
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 4		; GFX10-NEXT: v_readlane_b32 s31, v40, 4
; GFX10-NEXT: v_readlane_b32 s30, v40, 3		; GFX10-NEXT: v_readlane_b32 s30, v40, 3
; GFX10-NEXT: v_readlane_b32 s6, v40, 2		; GFX10-NEXT: v_readlane_b32 s6, v40, 2
Show All 14 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 5		; GFX9-NEXT: v_writelane_b32 v40, s30, 5
; GFX9-NEXT: s_mov_b32 s4, 1.0		; GFX9-NEXT: s_mov_b32 s4, 1.0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 4.0		; GFX9-NEXT: s_mov_b32 s6, 4.0
; GFX9-NEXT: s_mov_b32 s7, -1.0		; GFX9-NEXT: s_mov_b32 s7, -1.0
; GFX9-NEXT: s_mov_b32 s8, 0.5		; GFX9-NEXT: s_mov_b32 s8, 0.5
Show All 22 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1.0		; GFX10-NEXT: s_mov_b32 s4, 1.0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 4.0		; GFX10-NEXT: s_mov_b32 s6, 4.0
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, -1.0		; GFX10-NEXT: s_mov_b32 s7, -1.0
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: s_mov_b32 s8, 0.5		; GFX10-NEXT: s_mov_b32 s8, 0.5
; GFX10-NEXT: v_writelane_b32 v40, s30, 5		; GFX10-NEXT: v_writelane_b32 v40, s30, 5
; GFX10-NEXT: v_writelane_b32 v40, s31, 6		; GFX10-NEXT: v_writelane_b32 v40, s31, 6
Show All 20 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5		; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
Show All 22 Lines
define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0		; GFX9-NEXT: s_mov_b32 s4, 0
; GFX9-NEXT: s_mov_b32 s5, 0x40100000		; GFX9-NEXT: s_mov_b32 s5, 0x40100000
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0		; GFX10-NEXT: s_mov_b32 s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 0x40100000
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 0x40100000
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
Show All 11 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
Show All 13 Lines
define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 0		; GFX9-NEXT: s_mov_b32 s4, 0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 0		; GFX9-NEXT: s_mov_b32 s6, 0
; GFX9-NEXT: s_mov_b32 s7, 0x40100000		; GFX9-NEXT: s_mov_b32 s7, 0x40100000
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
Show All 20 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0		; GFX10-NEXT: s_mov_b32 s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 0		; GFX10-NEXT: s_mov_b32 s6, 0
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, 0x40100000		; GFX10-NEXT: s_mov_b32 s7, 0x40100000
; GFX10-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-NEXT: v_readlane_b32 s31, v40, 5
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
Show All 19 Lines
define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 6		; GFX9-NEXT: v_writelane_b32 v40, s30, 6
; GFX9-NEXT: s_mov_b32 s4, 0		; GFX9-NEXT: s_mov_b32 s4, 0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 0		; GFX9-NEXT: s_mov_b32 s6, 0
; GFX9-NEXT: s_mov_b32 s7, 0x40100000		; GFX9-NEXT: s_mov_b32 s7, 0x40100000
; GFX9-NEXT: s_mov_b32 s8, 0		; GFX9-NEXT: s_mov_b32 s8, 0
Show All 24 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0		; GFX10-NEXT: s_mov_b32 s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 0		; GFX10-NEXT: s_mov_b32 s6, 0
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, 0x40100000		; GFX10-NEXT: s_mov_b32 s7, 0x40100000
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: s_mov_b32 s8, 0		; GFX10-NEXT: s_mov_b32 s8, 0
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-NEXT: s_mov_b32 s9, 0x40200000		; GFX10-NEXT: s_mov_b32 s9, 0x40200000
Show All 23 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000		; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000
Show All 25 Lines
define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:		; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 13 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 9 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 12 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:		; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
Show All 18 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
Show All 21 Lines
define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:		; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
Show All 18 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
Show All 21 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0x20001		; GFX9-NEXT: s_mov_b32 s4, 0x20001
; GFX9-NEXT: s_mov_b32 s5, 3		; GFX9-NEXT: s_mov_b32 s5, 3
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0x20001		; GFX10-NEXT: s_mov_b32 s4, 0x20001
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 3
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 3
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
Show All 11 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
Show All 13 Lines
define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0x40003c00		; GFX9-NEXT: s_mov_b32 s4, 0x40003c00
; GFX9-NEXT: s_movk_i32 s5, 0x4400		; GFX9-NEXT: s_movk_i32 s5, 0x4400
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0x40003c00		; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_movk_i32 s5, 0x4400
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_movk_i32 s5, 0x4400
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
Show All 11 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
Show All 13 Lines
define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:		; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
Show All 18 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
Show All 21 Lines
define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0x20001		; GFX9-NEXT: s_mov_b32 s4, 0x20001
; GFX9-NEXT: s_mov_b32 s5, 0x40003		; GFX9-NEXT: s_mov_b32 s5, 0x40003
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0x20001		; GFX10-NEXT: s_mov_b32 s4, 0x20001
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 0x40003
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 0x40003
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
Show All 11 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
Show All 13 Lines
define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:		; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 13 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 9 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
Show All 12 Lines
define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:		; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 14 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
Show All 18 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
Show All 21 Lines
define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
Show All 11 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
Show All 13 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 3		; GFX9-NEXT: v_writelane_b32 v40, s30, 3
; GFX9-NEXT: s_mov_b32 s4, 3		; GFX9-NEXT: s_mov_b32 s4, 3
; GFX9-NEXT: s_mov_b32 s5, 4		; GFX9-NEXT: s_mov_b32 s5, 4
; GFX9-NEXT: s_mov_b32 s6, 5		; GFX9-NEXT: s_mov_b32 s6, 5
; GFX9-NEXT: v_writelane_b32 v40, s31, 4		; GFX9-NEXT: v_writelane_b32 v40, s31, 4
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 18 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 3		; GFX10-NEXT: s_mov_b32 s4, 3
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 4
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 4
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 5		; GFX10-NEXT: s_mov_b32 s6, 5
; GFX10-NEXT: v_writelane_b32 v40, s30, 3		; GFX10-NEXT: v_writelane_b32 v40, s30, 3
; GFX10-NEXT: v_writelane_b32 v40, s31, 4		; GFX10-NEXT: v_writelane_b32 v40, s31, 4
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 4		; GFX10-NEXT: v_readlane_b32 s31, v40, 4
; GFX10-NEXT: v_readlane_b32 s30, v40, 3		; GFX10-NEXT: v_readlane_b32 s30, v40, 3
; GFX10-NEXT: v_readlane_b32 s6, v40, 2		; GFX10-NEXT: v_readlane_b32 s6, v40, 2
Show All 14 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
Show All 16 Lines
define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:		; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 3		; GFX9-NEXT: s_mov_b32 s4, 3
; GFX9-NEXT: s_mov_b32 s5, 4		; GFX9-NEXT: s_mov_b32 s5, 4
; GFX9-NEXT: s_mov_b32 s6, 5		; GFX9-NEXT: s_mov_b32 s6, 5
; GFX9-NEXT: s_mov_b32 s7, 6		; GFX9-NEXT: s_mov_b32 s7, 6
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
Show All 20 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 3		; GFX10-NEXT: s_mov_b32 s4, 3
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 4
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 4
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 5		; GFX10-NEXT: s_mov_b32 s6, 5
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, 6		; GFX10-NEXT: s_mov_b32 s7, 6
; GFX10-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-NEXT: v_readlane_b32 s31, v40, 5
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
Show All 19 Lines
define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:		; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
Show All 22 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
Show All 25 Lines
define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
Show All 20 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, 4		; GFX10-NEXT: s_mov_b32 s7, 4
; GFX10-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-NEXT: v_readlane_b32 s31, v40, 5
Show All 17 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
Show All 19 Lines
define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 5		; GFX9-NEXT: v_writelane_b32 v40, s30, 5
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: s_mov_b32 s8, 5		; GFX9-NEXT: s_mov_b32 s8, 5
Show All 22 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, 4		; GFX10-NEXT: s_mov_b32 s7, 4
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: s_mov_b32 s8, 5		; GFX10-NEXT: s_mov_b32 s8, 5
; GFX10-NEXT: v_writelane_b32 v40, s30, 5		; GFX10-NEXT: v_writelane_b32 v40, s30, 5
; GFX10-NEXT: v_writelane_b32 v40, s31, 6		; GFX10-NEXT: v_writelane_b32 v40, s31, 6
Show All 20 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5		; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
Show All 22 Lines
define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:		; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dwordx8 s[4:11], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx8 s[4:11], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
Show All 28 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
Show All 32 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
Show All 36 Lines
define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 8		; GFX9-NEXT: v_writelane_b32 v40, s30, 8
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: s_mov_b32 s8, 5		; GFX9-NEXT: s_mov_b32 s8, 5
Show All 28 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_mov_b32 s7, 4		; GFX10-NEXT: s_mov_b32 s7, 4
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: s_mov_b32 s8, 5		; GFX10-NEXT: s_mov_b32 s8, 5
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-NEXT: s_mov_b32 s9, 6		; GFX10-NEXT: s_mov_b32 s9, 6
Show All 29 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4		; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5		; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6		; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6
Show All 31 Lines
define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:		; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v40, s12, 8		; GFX9-NEXT: v_writelane_b32 v40, s12, 8
; GFX9-NEXT: v_writelane_b32 v40, s13, 9		; GFX9-NEXT: v_writelane_b32 v40, s13, 9
; GFX9-NEXT: v_writelane_b32 v40, s14, 10		; GFX9-NEXT: v_writelane_b32 v40, s14, 10
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s15, 11		; GFX9-NEXT: v_writelane_b32 v40, s15, 11
; GFX9-NEXT: v_writelane_b32 v40, s16, 12		; GFX9-NEXT: v_writelane_b32 v40, s16, 12
; GFX9-NEXT: v_writelane_b32 v40, s17, 13		; GFX9-NEXT: v_writelane_b32 v40, s17, 13
; GFX9-NEXT: v_writelane_b32 v40, s18, 14		; GFX9-NEXT: v_writelane_b32 v40, s18, 14
; GFX9-NEXT: v_writelane_b32 v40, s19, 15		; GFX9-NEXT: v_writelane_b32 v40, s19, 15
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
Show All 36 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:		; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v40, s12, 8		; GFX9-NEXT: v_writelane_b32 v40, s12, 8
; GFX9-NEXT: v_writelane_b32 v40, s13, 9		; GFX9-NEXT: v_writelane_b32 v40, s13, 9
; GFX9-NEXT: v_writelane_b32 v40, s14, 10		; GFX9-NEXT: v_writelane_b32 v40, s14, 10
; GFX9-NEXT: v_writelane_b32 v40, s15, 11		; GFX9-NEXT: v_writelane_b32 v40, s15, 11
; GFX9-NEXT: v_writelane_b32 v40, s16, 12		; GFX9-NEXT: v_writelane_b32 v40, s16, 12
; GFX9-NEXT: v_writelane_b32 v40, s17, 13		; GFX9-NEXT: v_writelane_b32 v40, s17, 13
; GFX9-NEXT: v_writelane_b32 v40, s18, 14		; GFX9-NEXT: v_writelane_b32 v40, s18, 14
; GFX9-NEXT: v_writelane_b32 v40, s19, 15		; GFX9-NEXT: v_writelane_b32 v40, s19, 15
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s20, 16		; GFX9-NEXT: v_writelane_b32 v40, s20, 16
; GFX9-NEXT: v_writelane_b32 v40, s21, 17		; GFX9-NEXT: v_writelane_b32 v40, s21, 17
; GFX9-NEXT: v_writelane_b32 v40, s22, 18		; GFX9-NEXT: v_writelane_b32 v40, s22, 18
; GFX9-NEXT: v_writelane_b32 v40, s23, 19		; GFX9-NEXT: v_writelane_b32 v40, s23, 19
; GFX9-NEXT: v_writelane_b32 v40, s24, 20		; GFX9-NEXT: v_writelane_b32 v40, s24, 20
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40		; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40
; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {		define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {
; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:		; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v40, s12, 8		; GFX9-NEXT: v_writelane_b32 v40, s12, 8
; GFX9-NEXT: v_writelane_b32 v40, s13, 9		; GFX9-NEXT: v_writelane_b32 v40, s13, 9
; GFX9-NEXT: v_writelane_b32 v40, s14, 10		; GFX9-NEXT: v_writelane_b32 v40, s14, 10
; GFX9-NEXT: v_writelane_b32 v40, s15, 11		; GFX9-NEXT: v_writelane_b32 v40, s15, 11
; GFX9-NEXT: v_writelane_b32 v40, s16, 12		; GFX9-NEXT: v_writelane_b32 v40, s16, 12
; GFX9-NEXT: v_writelane_b32 v40, s17, 13		; GFX9-NEXT: v_writelane_b32 v40, s17, 13
; GFX9-NEXT: v_writelane_b32 v40, s18, 14		; GFX9-NEXT: v_writelane_b32 v40, s18, 14
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s19, 15		; GFX9-NEXT: v_writelane_b32 v40, s19, 15
; GFX9-NEXT: v_writelane_b32 v40, s20, 16		; GFX9-NEXT: v_writelane_b32 v40, s20, 16
; GFX9-NEXT: v_writelane_b32 v40, s21, 17		; GFX9-NEXT: v_writelane_b32 v40, s21, 17
; GFX9-NEXT: v_writelane_b32 v40, s22, 18		; GFX9-NEXT: v_writelane_b32 v40, s22, 18
; GFX9-NEXT: v_writelane_b32 v40, s23, 19		; GFX9-NEXT: v_writelane_b32 v40, s23, 19
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s52, s[34:35], 0x0		; GFX9-NEXT: s_load_dword s52, s[34:35], 0x0
; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35		; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33		; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33
; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4		; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
; GFX9-NEXT: s_waitcnt vmcnt(1)		; GFX9-NEXT: s_waitcnt vmcnt(1)
; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32		; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
Show All 20 Lines
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
; GFX10-NEXT: s_waitcnt vmcnt(1)		; GFX10-NEXT: s_waitcnt vmcnt(1)
; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
		; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 24 Lines
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_mov_b32_e32 v0, 12		; GFX9-NEXT: v_mov_b32_e32 v0, 12
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX9-NEXT: v_mov_b32_e32 v0, 13		; GFX9-NEXT: v_mov_b32_e32 v0, 13
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
; GFX9-NEXT: v_mov_b32_e32 v0, 14		; GFX9-NEXT: v_mov_b32_e32 v0, 14
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
; GFX9-NEXT: v_mov_b32_e32 v0, 15		; GFX9-NEXT: v_mov_b32_e32 v0, 15
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 1		; GFX9-NEXT: v_mov_b32_e32 v3, 1
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_mov_b32_e32 v0, 12		; GFX10-NEXT: v_mov_b32_e32 v0, 12
; GFX10-NEXT: v_mov_b32_e32 v1, 13		; GFX10-NEXT: v_mov_b32_e32 v1, 13
; GFX10-NEXT: v_mov_b32_e32 v2, 14		; GFX10-NEXT: v_mov_b32_e32 v2, 14
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_mov_b32_e32 v3, 15		; GFX10-NEXT: v_mov_b32_e32 v3, 15
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12		; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_mov_b32_e32 v3, 1		; GFX10-NEXT: v_mov_b32_e32 v3, 1
; GFX10-NEXT: v_mov_b32_e32 v4, 1		; GFX10-NEXT: v_mov_b32_e32 v4, 1
; GFX10-NEXT: v_mov_b32_e32 v5, 1		; GFX10-NEXT: v_mov_b32_e32 v5, 1
; GFX10-NEXT: v_mov_b32_e32 v6, 2		; GFX10-NEXT: v_mov_b32_e32 v6, 2
; GFX10-NEXT: v_mov_b32_e32 v7, 2		; GFX10-NEXT: v_mov_b32_e32 v7, 2
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v9, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v9, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v10, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v10, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v11, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v11, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, 4
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
; GFX9-NEXT: v_mov_b32_e32 v0, 11		; GFX9-NEXT: v_mov_b32_e32 v0, 11
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX9-NEXT: v_mov_b32_e32 v0, 12		; GFX9-NEXT: v_mov_b32_e32 v0, 12
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
; GFX9-NEXT: v_mov_b32_e32 v0, 13		; GFX9-NEXT: v_mov_b32_e32 v0, 13
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
; GFX9-NEXT: v_mov_b32_e32 v0, 14		; GFX9-NEXT: v_mov_b32_e32 v0, 14
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
; GFX9-NEXT: v_mov_b32_e32 v0, 15		; GFX9-NEXT: v_mov_b32_e32 v0, 15
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0		; GFX9-NEXT: v_mov_b32_e32 v3, 0
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GFX10-NEXT: v_mov_b32_e32 v0, 11		; GFX10-NEXT: v_mov_b32_e32 v0, 11
; GFX10-NEXT: v_mov_b32_e32 v1, 12		; GFX10-NEXT: v_mov_b32_e32 v1, 12
; GFX10-NEXT: v_mov_b32_e32 v2, 13		; GFX10-NEXT: v_mov_b32_e32 v2, 13
; GFX10-NEXT: v_mov_b32_e32 v3, 14		; GFX10-NEXT: v_mov_b32_e32 v3, 14
		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 15		; GFX10-NEXT: v_mov_b32_e32 v4, 15
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20
; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24		; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24
; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28		; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0		; GFX9-NEXT: v_mov_b32_e32 v3, 0
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000		; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000
; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000		; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000
; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000		; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000
; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000		; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000
		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000		; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20
; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24		; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24
; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28		; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1.0
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s		; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s
; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s		; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s

declare hidden amdgpu_gfx void @external_void_func_void() #0		declare hidden amdgpu_gfx void @external_void_func_void() #0

define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
Show All 17 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 23 Lines

define amdgpu_gfx void @void_func_void_clobber_s28_s29() #1 {		define amdgpu_gfx void @void_func_void_clobber_s28_s29() #1 {
; GFX9-LABEL: void_func_void_clobber_s28_s29:		; GFX9-LABEL: void_func_void_clobber_s28_s29:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr0
; GFX9-NEXT: v_writelane_b32 v0, s28, 0		; GFX9-NEXT: v_writelane_b32 v0, s28, 0
; GFX9-NEXT: v_writelane_b32 v0, s29, 1		; GFX9-NEXT: v_writelane_b32 v0, s29, 1
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; clobber		; GFX9-NEXT: ; clobber
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_readlane_b32 s29, v0, 1		; GFX9-NEXT: v_readlane_b32 s29, v0, 1
; GFX9-NEXT: v_readlane_b32 s28, v0, 0		; GFX9-NEXT: v_readlane_b32 s28, v0, 0
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: void_func_void_clobber_s28_s29:		; GFX10-LABEL: void_func_void_clobber_s28_s29:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr0
; GFX10-NEXT: v_writelane_b32 v0, s28, 0		; GFX10-NEXT: v_writelane_b32 v0, s28, 0
; GFX10-NEXT: v_writelane_b32 v0, s29, 1		; GFX10-NEXT: v_writelane_b32 v0, s29, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; clobber		; GFX10-NEXT: ; clobber
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s29, v0, 1		; GFX10-NEXT: v_readlane_b32 s29, v0, 1
; GFX10-NEXT: v_readlane_b32 s28, v0, 0		; GFX10-NEXT: v_readlane_b32 s28, v0, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
Show All 15 Lines
define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:		; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s31		; GFX9-NEXT: ; def s31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s31		; GFX9-NEXT: s_mov_b32 s4, s31
Show All 21 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
Show All 24 Lines	; GFX10-NEXT: s_setpc_b64 s[30:31]
ret void		ret void
}		}

define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:		; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr41
; GFX9-NEXT: v_writelane_b32 v42, s33, 0		; GFX9-NEXT: v_writelane_b32 v42, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v41, s30, 0
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v41, s31, 1
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v31		; GFX9-NEXT: ; def v31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_mov_b32_e32 v41, v31		; GFX9-NEXT: v_mov_b32_e32 v40, v31
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_mov_b32_e32 v31, v41		; GFX9-NEXT: v_mov_b32_e32 v31, v40
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use v31		; GFX9-NEXT: ; use v31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v41, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v41, 0
; GFX9-NEXT: s_addk_i32 s32, 0xfc00		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: v_readlane_b32 s33, v42, 0		; GFX9-NEXT: v_readlane_b32 s33, v42, 0
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:		; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr41
; GFX10-NEXT: v_writelane_b32 v42, s33, 0		; GFX10-NEXT: v_writelane_b32 v42, s33, 0
		; GFX10-NEXT: v_writelane_b32 v41, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v31		; GFX10-NEXT: ; def v31
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_mov_b32_e32 v41, v31		; GFX10-NEXT: v_writelane_b32 v41, s31, 1
		; GFX10-NEXT: v_mov_b32_e32 v40, v31
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_mov_b32_e32 v31, v41		; GFX10-NEXT: v_mov_b32_e32 v31, v40
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v31		; GFX10-NEXT: ; use v31
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v41, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v41, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v42, 0		; GFX10-NEXT: v_readlane_b32 s33, v42, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%v31 = call i32 asm sideeffect "; def $0", "={v31}"()		%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
call amdgpu_gfx void @external_void_func_void()		call amdgpu_gfx void @external_void_func_void()
call void asm sideeffect "; use $0", "{v31}"(i32 %v31)		call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
ret void		ret void
}		}


define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_preserves_s33:		; GFX9-LABEL: test_call_void_func_void_preserves_s33:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s33		; GFX9-NEXT: ; def s33
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s33		; GFX9-NEXT: s_mov_b32 s4, s33
Show All 21 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s33		; GFX10-NEXT: ; def s33
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s33		; GFX10-NEXT: s_mov_b32 s4, s33
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: s_mov_b32 s33, s4		; GFX10-NEXT: s_mov_b32 s33, s4
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s33		; GFX10-NEXT: ; use s33
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 18 Lines
define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_preserves_s34:		; GFX9-LABEL: test_call_void_func_void_preserves_s34:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s34		; GFX9-NEXT: ; def s34
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_mov_b32 s4, s34		; GFX9-NEXT: s_mov_b32 s4, s34
Show All 21 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[36:37]		; GFX10-NEXT: s_getpc_b64 s[36:37]
; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s34		; GFX10-NEXT: ; def s34
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s34		; GFX10-NEXT: s_mov_b32 s4, s34
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]
; GFX10-NEXT: s_mov_b32 s34, s4		; GFX10-NEXT: s_mov_b32 s34, s4
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s34		; GFX10-NEXT: ; use s34
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 18 Lines
define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_preserves_v40:		; GFX9-LABEL: test_call_void_func_void_preserves_v40:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr41
; GFX9-NEXT: v_writelane_b32 v42, s33, 0		; GFX9-NEXT: v_writelane_b32 v42, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v41, s30, 0		; GFX9-NEXT: v_writelane_b32 v41, s30, 0
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v41, s31, 1		; GFX9-NEXT: v_writelane_b32 v41, s31, 1
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v40		; GFX9-NEXT: ; def v40
Show All 21 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v41, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr41
; GFX10-NEXT: v_writelane_b32 v42, s33, 0		; GFX10-NEXT: v_writelane_b32 v42, s33, 0
		; GFX10-NEXT: v_writelane_b32 v41, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: v_writelane_b32 v41, s31, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v40		; GFX10-NEXT: ; def v40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
		; GFX10-NEXT: v_writelane_b32 v41, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v40		; GFX10-NEXT: ; use v40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
Show All 17 Lines

define hidden void @void_func_void_clobber_s33() #1 {		define hidden void @void_func_void_clobber_s33() #1 {
; GFX9-LABEL: void_func_void_clobber_s33:		; GFX9-LABEL: void_func_void_clobber_s33:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
		; GFX9-NEXT: ; implicit-def: $vgpr0
; GFX9-NEXT: v_writelane_b32 v0, s33, 0		; GFX9-NEXT: v_writelane_b32 v0, s33, 0
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; clobber		; GFX9-NEXT: ; clobber
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_readlane_b32 s33, v0, 0		; GFX9-NEXT: v_readlane_b32 s33, v0, 0
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: void_func_void_clobber_s33:		; GFX10-LABEL: void_func_void_clobber_s33:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
		; GFX10-NEXT: ; implicit-def: $vgpr0
; GFX10-NEXT: v_writelane_b32 v0, s33, 0		; GFX10-NEXT: v_writelane_b32 v0, s33, 0
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; clobber		; GFX10-NEXT: ; clobber
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s33, v0, 0		; GFX10-NEXT: v_readlane_b32 s33, v0, 0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "; clobber", "~{s33}"() #0		call void asm sideeffect "; clobber", "~{s33}"() #0
ret void		ret void
}		}

define hidden void @void_func_void_clobber_s34() #1 {		define hidden void @void_func_void_clobber_s34() #1 {
; GFX9-LABEL: void_func_void_clobber_s34:		; GFX9-LABEL: void_func_void_clobber_s34:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
		; GFX9-NEXT: ; implicit-def: $vgpr0
; GFX9-NEXT: v_writelane_b32 v0, s34, 0		; GFX9-NEXT: v_writelane_b32 v0, s34, 0
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; clobber		; GFX9-NEXT: ; clobber
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_readlane_b32 s34, v0, 0		; GFX9-NEXT: v_readlane_b32 s34, v0, 0
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: void_func_void_clobber_s34:		; GFX10-LABEL: void_func_void_clobber_s34:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
		; GFX10-NEXT: ; implicit-def: $vgpr0
; GFX10-NEXT: v_writelane_b32 v0, s34, 0		; GFX10-NEXT: v_writelane_b32 v0, s34, 0
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; clobber		; GFX10-NEXT: ; clobber
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s34, v0, 0		; GFX10-NEXT: v_readlane_b32 s34, v0, 0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "; clobber", "~{s34}"() #0		call void asm sideeffect "; clobber", "~{s34}"() #0
ret void		ret void
}		}

define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {		define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {
; GFX9-LABEL: test_call_void_func_void_clobber_s33:		; GFX9-LABEL: test_call_void_func_void_clobber_s33:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
Show All 13 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 15 Lines
define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {		define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {
; GFX9-LABEL: test_call_void_func_void_clobber_s34:		; GFX9-LABEL: test_call_void_func_void_clobber_s34:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
Show All 13 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 15 Lines
define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {		define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {
; GFX9-LABEL: callee_saved_sgpr_kernel:		; GFX9-LABEL: callee_saved_sgpr_kernel:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s33, 0		; GFX9-NEXT: v_writelane_b32 v41, s33, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s40		; GFX9-NEXT: ; def s40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s40		; GFX9-NEXT: s_mov_b32 s4, s40
Show All 20 Lines
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s33, 0		; GFX10-NEXT: v_writelane_b32 v41, s33, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s40		; GFX10-NEXT: ; def s40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s40		; GFX10-NEXT: s_mov_b32 s4, s40
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s4		; GFX10-NEXT: ; use s4
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
Show All 14 Lines	; GFX10-NEXT: s_setpc_b64 s[30:31]
ret void		ret void
}		}

define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {		define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {
; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:		; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr41
; GFX9-NEXT: v_writelane_b32 v42, s33, 0		; GFX9-NEXT: v_writelane_b32 v42, s33, 0
		; GFX9-NEXT: v_writelane_b32 v41, s4, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v41, s30, 1
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v41, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s40		; GFX9-NEXT: ; def s40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s40		; GFX9-NEXT: s_mov_b32 s4, s40
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v32		; GFX9-NEXT: ; def v32
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_mov_b32_e32 v41, v32		; GFX9-NEXT: v_mov_b32_e32 v40, v32
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use s4		; GFX9-NEXT: ; use s4
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use v41		; GFX9-NEXT: ; use v40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s31, v40, 2		; GFX9-NEXT: v_readlane_b32 s31, v41, 2
; GFX9-NEXT: v_readlane_b32 s30, v40, 1		; GFX9-NEXT: v_readlane_b32 s30, v41, 1
; GFX9-NEXT: v_readlane_b32 s4, v40, 0		; GFX9-NEXT: v_readlane_b32 s4, v41, 0
; GFX9-NEXT: s_addk_i32 s32, 0xfc00		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: v_readlane_b32 s33, v42, 0		; GFX9-NEXT: v_readlane_b32 s33, v42, 0
; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:		; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr41
; GFX10-NEXT: v_writelane_b32 v42, s33, 0		; GFX10-NEXT: v_writelane_b32 v42, s33, 0
		; GFX10-NEXT: v_writelane_b32 v41, s4, 0
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s40		; GFX10-NEXT: ; def s40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
		; GFX10-NEXT: v_writelane_b32 v41, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s40		; GFX10-NEXT: s_mov_b32 s4, s40
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v32		; GFX10-NEXT: ; def v32
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_mov_b32_e32 v41, v32		; GFX10-NEXT: v_mov_b32_e32 v40, v32
; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v41, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s4		; GFX10-NEXT: ; use s4
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v41		; GFX10-NEXT: ; use v40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v41, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v41, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v41, 0
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_readlane_b32 s33, v42, 0		; GFX10-NEXT: v_readlane_b32 s33, v42, 0
; GFX10-NEXT: s_or_saveexec_b32 s34, -1		; GFX10-NEXT: s_or_saveexec_b32 s34, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0		%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0		%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0
call amdgpu_gfx void @external_void_func_void()		call amdgpu_gfx void @external_void_func_void()
call void asm sideeffect "; use $0", "s"(i32 %s40) #0		call void asm sideeffect "; use $0", "s"(i32 %s40) #0
call void asm sideeffect "; use $0", "v"(i32 %v32) #0		call void asm sideeffect "; use $0", "v"(i32 %v32) #0
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind noinline }		attributes #1 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

	Show All 27 Lines
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_mov_b32 s36, s33			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr1
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v1, s30, 0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	Show All 33 Lines
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_mov_b32 s36, s33			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr1
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v1, s30, 0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	Show All 33 Lines
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_mov_b32 s36, s33			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr1
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v1, s30, 0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	Show All 35 Lines
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_mov_b32 s36, s33			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr2
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	▲ Show 20 Lines • Show All 1,062 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s36, s33			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_add_i32 s33, s32, 0x1ffc0			; GFX9-NEXT: s_add_i32 s33, s32, 0x1ffc0
	; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000			; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000
	; GFX9-NEXT: s_add_i32 s32, s32, 0x60000			; GFX9-NEXT: s_add_i32 s32, s32, 0x60000
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr2
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
				; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000			; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	Show All 12 Lines
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_add_i32 s33, s32, 0xffe0			; GFX10-NEXT: s_add_i32 s33, s32, 0xffe0
	; GFX10-NEXT: s_add_i32 s32, s32, 0x30000			; GFX10-NEXT: s_add_i32 s32, s32, 0x30000
	; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000			; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000			; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	Show All 11 Lines

llvm/test/CodeGen/AMDGPU/indirect-call.ll

Show First 20 Lines • Show All 394 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v41, s33, 0		; GCN-NEXT: v_writelane_b32 v41, s33, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1		; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[16:17]		; GISEL-NEXT: s_mov_b64 exec, s[16:17]
; GISEL-NEXT: v_writelane_b32 v41, s33, 0		; GISEL-NEXT: v_writelane_b32 v41, s33, 0
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v41, s33, 0		; GCN-NEXT: v_writelane_b32 v41, s33, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1		; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[16:17]		; GISEL-NEXT: s_mov_b64 exec, s[16:17]
; GISEL-NEXT: v_writelane_b32 v41, s33, 0		; GISEL-NEXT: v_writelane_b32 v41, s33, 0
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v41, s33, 0		; GCN-NEXT: v_writelane_b32 v41, s33, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1		; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[16:17]		; GISEL-NEXT: s_mov_b64 exec, s[16:17]
; GISEL-NEXT: v_writelane_b32 v41, s33, 0		; GISEL-NEXT: v_writelane_b32 v41, s33, 0
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v41, s33, 0		; GCN-NEXT: v_writelane_b32 v41, s33, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1		; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[16:17]		; GISEL-NEXT: s_mov_b64 exec, s[16:17]
; GISEL-NEXT: v_writelane_b32 v41, s33, 0		; GISEL-NEXT: v_writelane_b32 v41, s33, 0
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s5, s33		; GCN-NEXT: s_mov_b32 s5, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_mov_b32 s5, s33		; GISEL-NEXT: s_mov_b32 s5, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	; GISEL-NEXT: s_setpc_b64 s[30:31]
ret void		ret void
}		}

define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {		define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s10, s33		; GCN-NEXT: s_mov_b32 s10, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: ; implicit-def: $vgpr41
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v41, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v41, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v41, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v41, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v41, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v41, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v41, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s40, 8		; GCN-NEXT: v_writelane_b32 v41, s39, 7
; GCN-NEXT: v_writelane_b32 v40, s41, 9		; GCN-NEXT: v_writelane_b32 v41, s40, 8
; GCN-NEXT: v_writelane_b32 v40, s42, 10		; GCN-NEXT: v_writelane_b32 v41, s41, 9
; GCN-NEXT: v_writelane_b32 v40, s43, 11		; GCN-NEXT: v_writelane_b32 v41, s42, 10
; GCN-NEXT: v_writelane_b32 v40, s44, 12		; GCN-NEXT: v_writelane_b32 v41, s43, 11
; GCN-NEXT: v_writelane_b32 v40, s45, 13		; GCN-NEXT: v_writelane_b32 v41, s44, 12
; GCN-NEXT: v_writelane_b32 v40, s46, 14		; GCN-NEXT: v_writelane_b32 v41, s45, 13
; GCN-NEXT: v_writelane_b32 v40, s47, 15		; GCN-NEXT: v_writelane_b32 v41, s46, 14
; GCN-NEXT: v_writelane_b32 v40, s48, 16		; GCN-NEXT: v_writelane_b32 v41, s47, 15
; GCN-NEXT: v_writelane_b32 v40, s49, 17		; GCN-NEXT: v_writelane_b32 v41, s48, 16
; GCN-NEXT: v_writelane_b32 v40, s50, 18		; GCN-NEXT: v_writelane_b32 v41, s49, 17
; GCN-NEXT: v_writelane_b32 v40, s51, 19		; GCN-NEXT: v_writelane_b32 v41, s50, 18
; GCN-NEXT: v_writelane_b32 v40, s52, 20		; GCN-NEXT: v_writelane_b32 v41, s51, 19
; GCN-NEXT: v_writelane_b32 v40, s53, 21		; GCN-NEXT: v_writelane_b32 v41, s52, 20
; GCN-NEXT: v_writelane_b32 v40, s54, 22		; GCN-NEXT: v_writelane_b32 v41, s53, 21
; GCN-NEXT: v_writelane_b32 v40, s55, 23		; GCN-NEXT: v_writelane_b32 v41, s54, 22
; GCN-NEXT: v_writelane_b32 v40, s56, 24		; GCN-NEXT: v_writelane_b32 v41, s55, 23
; GCN-NEXT: v_writelane_b32 v40, s57, 25		; GCN-NEXT: v_writelane_b32 v41, s56, 24
; GCN-NEXT: v_writelane_b32 v40, s58, 26		; GCN-NEXT: v_writelane_b32 v41, s57, 25
; GCN-NEXT: v_writelane_b32 v40, s59, 27		; GCN-NEXT: v_writelane_b32 v41, s58, 26
; GCN-NEXT: v_writelane_b32 v40, s60, 28		; GCN-NEXT: v_writelane_b32 v41, s59, 27
; GCN-NEXT: v_writelane_b32 v40, s61, 29		; GCN-NEXT: v_writelane_b32 v41, s60, 28
; GCN-NEXT: v_writelane_b32 v40, s62, 30		; GCN-NEXT: v_writelane_b32 v41, s61, 29
; GCN-NEXT: v_writelane_b32 v40, s63, 31		; GCN-NEXT: v_writelane_b32 v41, s62, 30
; GCN-NEXT: v_mov_b32_e32 v41, v0		; GCN-NEXT: v_writelane_b32 v41, s63, 31
		; GCN-NEXT: v_mov_b32_e32 v40, v0
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: s_mov_b64 s[4:5], exec
; GCN-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s8, v1		; GCN-NEXT: v_readfirstlane_b32 s8, v1
; GCN-NEXT: v_readfirstlane_b32 s9, v2		; GCN-NEXT: v_readfirstlane_b32 s9, v2
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[8:9], v[1:2]		; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[8:9], v[1:2]
; GCN-NEXT: s_and_saveexec_b64 s[6:7], vcc		; GCN-NEXT: s_and_saveexec_b64 s[6:7], vcc
; GCN-NEXT: v_mov_b32_e32 v0, v41		; GCN-NEXT: v_mov_b32_e32 v0, v40
; GCN-NEXT: s_swappc_b64 s[30:31], s[8:9]		; GCN-NEXT: s_swappc_b64 s[30:31], s[8:9]
; GCN-NEXT: ; implicit-def: $vgpr1_vgpr2		; GCN-NEXT: ; implicit-def: $vgpr1_vgpr2
; GCN-NEXT: s_xor_b64 exec, exec, s[6:7]		; GCN-NEXT: s_xor_b64 exec, exec, s[6:7]
; GCN-NEXT: s_cbranch_execnz .LBB7_1		; GCN-NEXT: s_cbranch_execnz .LBB7_1
; GCN-NEXT: ; %bb.2:		; GCN-NEXT: ; %bb.2:
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, v41		; GCN-NEXT: v_mov_b32_e32 v0, v40
; GCN-NEXT: v_readlane_b32 s63, v40, 31		; GCN-NEXT: v_readlane_b32 s63, v41, 31
; GCN-NEXT: v_readlane_b32 s62, v40, 30		; GCN-NEXT: v_readlane_b32 s62, v41, 30
; GCN-NEXT: v_readlane_b32 s61, v40, 29		; GCN-NEXT: v_readlane_b32 s61, v41, 29
; GCN-NEXT: v_readlane_b32 s60, v40, 28		; GCN-NEXT: v_readlane_b32 s60, v41, 28
; GCN-NEXT: v_readlane_b32 s59, v40, 27		; GCN-NEXT: v_readlane_b32 s59, v41, 27
; GCN-NEXT: v_readlane_b32 s58, v40, 26		; GCN-NEXT: v_readlane_b32 s58, v41, 26
; GCN-NEXT: v_readlane_b32 s57, v40, 25		; GCN-NEXT: v_readlane_b32 s57, v41, 25
; GCN-NEXT: v_readlane_b32 s56, v40, 24		; GCN-NEXT: v_readlane_b32 s56, v41, 24
; GCN-NEXT: v_readlane_b32 s55, v40, 23		; GCN-NEXT: v_readlane_b32 s55, v41, 23
; GCN-NEXT: v_readlane_b32 s54, v40, 22		; GCN-NEXT: v_readlane_b32 s54, v41, 22
; GCN-NEXT: v_readlane_b32 s53, v40, 21		; GCN-NEXT: v_readlane_b32 s53, v41, 21
; GCN-NEXT: v_readlane_b32 s52, v40, 20		; GCN-NEXT: v_readlane_b32 s52, v41, 20
; GCN-NEXT: v_readlane_b32 s51, v40, 19		; GCN-NEXT: v_readlane_b32 s51, v41, 19
; GCN-NEXT: v_readlane_b32 s50, v40, 18		; GCN-NEXT: v_readlane_b32 s50, v41, 18
; GCN-NEXT: v_readlane_b32 s49, v40, 17		; GCN-NEXT: v_readlane_b32 s49, v41, 17
; GCN-NEXT: v_readlane_b32 s48, v40, 16		; GCN-NEXT: v_readlane_b32 s48, v41, 16
; GCN-NEXT: v_readlane_b32 s47, v40, 15		; GCN-NEXT: v_readlane_b32 s47, v41, 15
; GCN-NEXT: v_readlane_b32 s46, v40, 14		; GCN-NEXT: v_readlane_b32 s46, v41, 14
; GCN-NEXT: v_readlane_b32 s45, v40, 13		; GCN-NEXT: v_readlane_b32 s45, v41, 13
; GCN-NEXT: v_readlane_b32 s44, v40, 12		; GCN-NEXT: v_readlane_b32 s44, v41, 12
; GCN-NEXT: v_readlane_b32 s43, v40, 11		; GCN-NEXT: v_readlane_b32 s43, v41, 11
; GCN-NEXT: v_readlane_b32 s42, v40, 10		; GCN-NEXT: v_readlane_b32 s42, v41, 10
; GCN-NEXT: v_readlane_b32 s41, v40, 9		; GCN-NEXT: v_readlane_b32 s41, v41, 9
; GCN-NEXT: v_readlane_b32 s40, v40, 8		; GCN-NEXT: v_readlane_b32 s40, v41, 8
; GCN-NEXT: v_readlane_b32 s39, v40, 7		; GCN-NEXT: v_readlane_b32 s39, v41, 7
; GCN-NEXT: v_readlane_b32 s38, v40, 6		; GCN-NEXT: v_readlane_b32 s38, v41, 6
; GCN-NEXT: v_readlane_b32 s37, v40, 5		; GCN-NEXT: v_readlane_b32 s37, v41, 5
; GCN-NEXT: v_readlane_b32 s36, v40, 4		; GCN-NEXT: v_readlane_b32 s36, v41, 4
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v41, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v41, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v41, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v41, 0
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s10		; GCN-NEXT: s_mov_b32 s33, s10
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_mov_b32 s10, s33		; GISEL-NEXT: s_mov_b32 s10, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: ; implicit-def: $vgpr41
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v41, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v41, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v41, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v41, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v41, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v41, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v41, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s40, 8		; GISEL-NEXT: v_writelane_b32 v41, s39, 7
; GISEL-NEXT: v_writelane_b32 v40, s41, 9		; GISEL-NEXT: v_writelane_b32 v41, s40, 8
; GISEL-NEXT: v_writelane_b32 v40, s42, 10		; GISEL-NEXT: v_writelane_b32 v41, s41, 9
; GISEL-NEXT: v_writelane_b32 v40, s43, 11		; GISEL-NEXT: v_writelane_b32 v41, s42, 10
; GISEL-NEXT: v_writelane_b32 v40, s44, 12		; GISEL-NEXT: v_writelane_b32 v41, s43, 11
; GISEL-NEXT: v_writelane_b32 v40, s45, 13		; GISEL-NEXT: v_writelane_b32 v41, s44, 12
; GISEL-NEXT: v_writelane_b32 v40, s46, 14		; GISEL-NEXT: v_writelane_b32 v41, s45, 13
; GISEL-NEXT: v_writelane_b32 v40, s47, 15		; GISEL-NEXT: v_writelane_b32 v41, s46, 14
; GISEL-NEXT: v_writelane_b32 v40, s48, 16		; GISEL-NEXT: v_writelane_b32 v41, s47, 15
; GISEL-NEXT: v_writelane_b32 v40, s49, 17		; GISEL-NEXT: v_writelane_b32 v41, s48, 16
; GISEL-NEXT: v_writelane_b32 v40, s50, 18		; GISEL-NEXT: v_writelane_b32 v41, s49, 17
; GISEL-NEXT: v_writelane_b32 v40, s51, 19		; GISEL-NEXT: v_writelane_b32 v41, s50, 18
; GISEL-NEXT: v_writelane_b32 v40, s52, 20		; GISEL-NEXT: v_writelane_b32 v41, s51, 19
; GISEL-NEXT: v_writelane_b32 v40, s53, 21		; GISEL-NEXT: v_writelane_b32 v41, s52, 20
; GISEL-NEXT: v_writelane_b32 v40, s54, 22		; GISEL-NEXT: v_writelane_b32 v41, s53, 21
; GISEL-NEXT: v_writelane_b32 v40, s55, 23		; GISEL-NEXT: v_writelane_b32 v41, s54, 22
; GISEL-NEXT: v_writelane_b32 v40, s56, 24		; GISEL-NEXT: v_writelane_b32 v41, s55, 23
; GISEL-NEXT: v_writelane_b32 v40, s57, 25		; GISEL-NEXT: v_writelane_b32 v41, s56, 24
; GISEL-NEXT: v_writelane_b32 v40, s58, 26		; GISEL-NEXT: v_writelane_b32 v41, s57, 25
; GISEL-NEXT: v_writelane_b32 v40, s59, 27		; GISEL-NEXT: v_writelane_b32 v41, s58, 26
; GISEL-NEXT: v_writelane_b32 v40, s60, 28		; GISEL-NEXT: v_writelane_b32 v41, s59, 27
; GISEL-NEXT: v_writelane_b32 v40, s61, 29		; GISEL-NEXT: v_writelane_b32 v41, s60, 28
; GISEL-NEXT: v_writelane_b32 v40, s62, 30		; GISEL-NEXT: v_writelane_b32 v41, s61, 29
; GISEL-NEXT: v_writelane_b32 v40, s63, 31		; GISEL-NEXT: v_writelane_b32 v41, s62, 30
; GISEL-NEXT: v_mov_b32_e32 v41, v0		; GISEL-NEXT: v_writelane_b32 v41, s63, 31
		; GISEL-NEXT: v_mov_b32_e32 v40, v0
; GISEL-NEXT: s_mov_b64 s[4:5], exec		; GISEL-NEXT: s_mov_b64 s[4:5], exec
; GISEL-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1		; GISEL-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1
; GISEL-NEXT: v_readfirstlane_b32 s6, v1		; GISEL-NEXT: v_readfirstlane_b32 s6, v1
; GISEL-NEXT: v_readfirstlane_b32 s7, v2		; GISEL-NEXT: v_readfirstlane_b32 s7, v2
; GISEL-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[1:2]		; GISEL-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[1:2]
; GISEL-NEXT: s_and_saveexec_b64 s[8:9], vcc		; GISEL-NEXT: s_and_saveexec_b64 s[8:9], vcc
; GISEL-NEXT: v_mov_b32_e32 v0, v41		; GISEL-NEXT: v_mov_b32_e32 v0, v40
; GISEL-NEXT: s_swappc_b64 s[30:31], s[6:7]		; GISEL-NEXT: s_swappc_b64 s[30:31], s[6:7]
; GISEL-NEXT: ; implicit-def: $vgpr1_vgpr2		; GISEL-NEXT: ; implicit-def: $vgpr1_vgpr2
; GISEL-NEXT: s_xor_b64 exec, exec, s[8:9]		; GISEL-NEXT: s_xor_b64 exec, exec, s[8:9]
; GISEL-NEXT: s_cbranch_execnz .LBB7_1		; GISEL-NEXT: s_cbranch_execnz .LBB7_1
; GISEL-NEXT: ; %bb.2:		; GISEL-NEXT: ; %bb.2:
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: v_mov_b32_e32 v0, v41		; GISEL-NEXT: v_mov_b32_e32 v0, v40
; GISEL-NEXT: v_readlane_b32 s63, v40, 31		; GISEL-NEXT: v_readlane_b32 s63, v41, 31
; GISEL-NEXT: v_readlane_b32 s62, v40, 30		; GISEL-NEXT: v_readlane_b32 s62, v41, 30
; GISEL-NEXT: v_readlane_b32 s61, v40, 29		; GISEL-NEXT: v_readlane_b32 s61, v41, 29
; GISEL-NEXT: v_readlane_b32 s60, v40, 28		; GISEL-NEXT: v_readlane_b32 s60, v41, 28
; GISEL-NEXT: v_readlane_b32 s59, v40, 27		; GISEL-NEXT: v_readlane_b32 s59, v41, 27
; GISEL-NEXT: v_readlane_b32 s58, v40, 26		; GISEL-NEXT: v_readlane_b32 s58, v41, 26
; GISEL-NEXT: v_readlane_b32 s57, v40, 25		; GISEL-NEXT: v_readlane_b32 s57, v41, 25
; GISEL-NEXT: v_readlane_b32 s56, v40, 24		; GISEL-NEXT: v_readlane_b32 s56, v41, 24
; GISEL-NEXT: v_readlane_b32 s55, v40, 23		; GISEL-NEXT: v_readlane_b32 s55, v41, 23
; GISEL-NEXT: v_readlane_b32 s54, v40, 22		; GISEL-NEXT: v_readlane_b32 s54, v41, 22
; GISEL-NEXT: v_readlane_b32 s53, v40, 21		; GISEL-NEXT: v_readlane_b32 s53, v41, 21
; GISEL-NEXT: v_readlane_b32 s52, v40, 20		; GISEL-NEXT: v_readlane_b32 s52, v41, 20
; GISEL-NEXT: v_readlane_b32 s51, v40, 19		; GISEL-NEXT: v_readlane_b32 s51, v41, 19
; GISEL-NEXT: v_readlane_b32 s50, v40, 18		; GISEL-NEXT: v_readlane_b32 s50, v41, 18
; GISEL-NEXT: v_readlane_b32 s49, v40, 17		; GISEL-NEXT: v_readlane_b32 s49, v41, 17
; GISEL-NEXT: v_readlane_b32 s48, v40, 16		; GISEL-NEXT: v_readlane_b32 s48, v41, 16
; GISEL-NEXT: v_readlane_b32 s47, v40, 15		; GISEL-NEXT: v_readlane_b32 s47, v41, 15
; GISEL-NEXT: v_readlane_b32 s46, v40, 14		; GISEL-NEXT: v_readlane_b32 s46, v41, 14
; GISEL-NEXT: v_readlane_b32 s45, v40, 13		; GISEL-NEXT: v_readlane_b32 s45, v41, 13
; GISEL-NEXT: v_readlane_b32 s44, v40, 12		; GISEL-NEXT: v_readlane_b32 s44, v41, 12
; GISEL-NEXT: v_readlane_b32 s43, v40, 11		; GISEL-NEXT: v_readlane_b32 s43, v41, 11
; GISEL-NEXT: v_readlane_b32 s42, v40, 10		; GISEL-NEXT: v_readlane_b32 s42, v41, 10
; GISEL-NEXT: v_readlane_b32 s41, v40, 9		; GISEL-NEXT: v_readlane_b32 s41, v41, 9
; GISEL-NEXT: v_readlane_b32 s40, v40, 8		; GISEL-NEXT: v_readlane_b32 s40, v41, 8
; GISEL-NEXT: v_readlane_b32 s39, v40, 7		; GISEL-NEXT: v_readlane_b32 s39, v41, 7
; GISEL-NEXT: v_readlane_b32 s38, v40, 6		; GISEL-NEXT: v_readlane_b32 s38, v41, 6
; GISEL-NEXT: v_readlane_b32 s37, v40, 5		; GISEL-NEXT: v_readlane_b32 s37, v41, 5
; GISEL-NEXT: v_readlane_b32 s36, v40, 4		; GISEL-NEXT: v_readlane_b32 s36, v41, 4
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v41, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v41, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v41, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v41, 0
; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s10		; GISEL-NEXT: s_mov_b32 s33, s10
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
call amdgpu_gfx void %fptr(i32 %i)		call amdgpu_gfx void %fptr(i32 %i)
ret i32 %i		ret i32 %i
}		}

; Use a variable inside a waterfall loop and use the return variable after the loop.		; Use a variable inside a waterfall loop and use the return variable after the loop.
; TODO The argument and return variable could be in the same physical register, but the register		; TODO The argument and return variable could be in the same physical register, but the register
; allocator is not able to do that because the return value clashes with the liverange of an		; allocator is not able to do that because the return value clashes with the liverange of an
; IMPLICIT_DEF of the argument.		; IMPLICIT_DEF of the argument.
define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {		define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s10, s33		; GCN-NEXT: s_mov_b32 s10, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_mov_b32 s10, s33		; GISEL-NEXT: s_mov_b32 s10, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s10, s33		; GCN-NEXT: s_mov_b32 s10, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_mov_b32 s10, s33		; GISEL-NEXT: s_mov_b32 s10, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

	Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB0]]			; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB0]]

	; XXX-W64-O0: s_mov_b64 exec, [[SAVEEXEC]]			; XXX-W64-O0: s_mov_b64 exec, [[SAVEEXEC]]
	; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload			; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload
	; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF:[0-9]+]] ; 4-byte Folded Spill			; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF:[0-9]+]] ; 4-byte Folded Spill
	; W64-O0: s_cbranch_execz [[TERMBB:.LBB[0-9]+_[0-9]+]]			; W64-O0: s_cbranch_execz [[TERMBB:.LBB[0-9]+_[0-9]+]]

	; W64-O0: ; %bb.{{[0-9]+}}: ; %bb1			; W64-O0: ; %bb.{{[0-9]+}}: ; %bb1
	; W64-O0-DAG: buffer_store_dword {{v[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill			; W64-O0: buffer_load_dword
				; W64-O0: buffer_store_dword
				; W64-O0: buffer_store_dword {{v[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
				; W64-O0: buffer_load_dword
	; W64-O0-DAG: s_mov_b64 s[[[SAVEEXEC0:[0-9]+]]:[[SAVEEXEC1:[0-9]+]]], exec			; W64-O0-DAG: s_mov_b64 s[[[SAVEEXEC0:[0-9]+]]:[[SAVEEXEC1:[0-9]+]]], exec
	; W64-O0: v_writelane_b32 [[VSAVEEXEC:v[0-9]+]], s[[SAVEEXEC0]], [[SAVEEXEC_IDX0:[0-9]+]]			; W64-O0: v_writelane_b32 [[VSAVEEXEC:v[0-9]+]], s[[SAVEEXEC0]], [[SAVEEXEC_IDX0:[0-9]+]]
	; W64-O0: v_writelane_b32 [[VSAVEEXEC]], s[[SAVEEXEC1]], [[SAVEEXEC_IDX1:[0-9]+]]			; W64-O0: v_writelane_b32 [[VSAVEEXEC]], s[[SAVEEXEC1]], [[SAVEEXEC_IDX1:[0-9]+]]

	; W64-O0: [[LOOPBB1:.LBB[0-9]+_[0-9]+]]: ; =>This Inner Loop Header: Depth=1			; W64-O0: [[LOOPBB1:.LBB[0-9]+_[0-9]+]]: ; =>This Inner Loop Header: Depth=1
	; W64-O0: buffer_load_dword v[[VRSRC0:[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC0:[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC1:[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC1:[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC2:[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC2:[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	Show All 9 Lines
	; W64-O0-DAG: s_mov_b32 s[[SRSRC2:[0-9]+]], s[[SRSRCTMP2]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC2:[0-9]+]], s[[SRSRCTMP2]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC3:[0-9]+]], s[[SRSRCTMP3]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC3:[0-9]+]], s[[SRSRCTMP3]]
	; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP1:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC2]]:[[SRSRC3]]], v[[[VRSRC2]]:[[VRSRC3]]]			; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP1:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC2]]:[[SRSRC3]]], v[[[VRSRC2]]:[[VRSRC3]]]
	; W64-O0-DAG: s_and_b64 [[AND:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]			; W64-O0-DAG: s_and_b64 [[AND:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]
	; W64-O0-DAG: s_mov_b32 s[[S1:[0-9]+]], s[[SRSRCTMP1]]			; W64-O0-DAG: s_mov_b32 s[[S1:[0-9]+]], s[[SRSRCTMP1]]
	; W64-O0-DAG: s_mov_b32 s[[S2:[0-9]+]], s[[SRSRCTMP2]]			; W64-O0-DAG: s_mov_b32 s[[S2:[0-9]+]], s[[SRSRCTMP2]]
	; W64-O0-DAG: s_mov_b32 s[[S3:[0-9]+]], s[[SRSRCTMP3]]			; W64-O0-DAG: s_mov_b32 s[[S3:[0-9]+]], s[[SRSRCTMP3]]
	; W64-O0: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[AND]]			; W64-O0: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[AND]]
				; W64-O0: buffer_store_dword
				; W64-O0: buffer_load_dword
	; W64-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF]] ; 4-byte Folded Reload			; W64-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF]] ; 4-byte Folded Reload
	; W64-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s[[[S0]]:[[S3]]], {{.*}} idxen			; W64-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s[[[S0]]:[[S3]]], {{.*}} idxen
	; W64-O0: s_waitcnt vmcnt(0)			; W64-O0: s_waitcnt vmcnt(0)
	; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill			; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill
	; W64-O0: s_xor_b64 exec, exec, [[SAVE]]			; W64-O0: s_xor_b64 exec, exec, [[SAVE]]
	; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB1]]			; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB1]]

	; W64-O0: v_readlane_b32 s[[SAVEEXEC0:[0-9]+]], [[VSAVEEXEC]], [[SAVEEXEC_IDX0]]			; W64-O0: v_readlane_b32 s[[SAVEEXEC0:[0-9]+]], [[VSAVEEXEC]], [[SAVEEXEC_IDX0]]
	Show All 32 Lines

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	; CHECK-NOT: mul i32
ret void		ret void
}		}

define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {		define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {
; GFX9-LABEL: slsr1_1:		; GFX9-LABEL: slsr1_1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: ; implicit-def: $vgpr42
; GFX9-NEXT: v_writelane_b32 v44, s33, 0		; GFX9-NEXT: v_writelane_b32 v44, s33, 0
		; GFX9-NEXT: v_writelane_b32 v42, s30, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v42, s31, 1
; GFX9-NEXT: v_writelane_b32 v40, s34, 2		; GFX9-NEXT: v_writelane_b32 v42, s34, 2
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12
; GFX9-NEXT: v_writelane_b32 v40, s35, 3		; GFX9-NEXT: v_writelane_b32 v42, s35, 3
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_mov_b32_e32 v41, v1		; GFX9-NEXT: v_mov_b32_e32 v40, v1
; GFX9-NEXT: v_mov_b32_e32 v42, v0		; GFX9-NEXT: v_mov_b32_e32 v41, v0
; GFX9-NEXT: v_mul_u32_u24_e32 v0, v42, v41		; GFX9-NEXT: v_mul_u32_u24_e32 v0, v41, v40
; GFX9-NEXT: v_and_b32_e32 v43, 0xffffff, v41		; GFX9-NEXT: v_and_b32_e32 v43, 0xffffff, v40
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_mad_u32_u24 v41, v42, v41, v43		; GFX9-NEXT: v_mad_u32_u24 v40, v41, v40, v43
; GFX9-NEXT: v_mov_b32_e32 v0, v41		; GFX9-NEXT: v_mov_b32_e32 v0, v40
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_add_u32_e32 v0, v41, v43		; GFX9-NEXT: v_add_u32_e32 v0, v40, v43
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s35, v40, 3		; GFX9-NEXT: v_readlane_b32 s35, v42, 3
; GFX9-NEXT: v_readlane_b32 s34, v40, 2		; GFX9-NEXT: v_readlane_b32 s34, v42, 2
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v42, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v42, 0
; GFX9-NEXT: s_addk_i32 s32, 0xf800		; GFX9-NEXT: s_addk_i32 s32, 0xf800
; GFX9-NEXT: v_readlane_b32 s33, v44, 0		; GFX9-NEXT: v_readlane_b32 s33, v44, 0
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
%b = and i32 %b.arg, 16777215		%b = and i32 %b.arg, 16777215
%s = and i32 %s.arg, 16777215		%s = and i32 %s.arg, 16777215

; CHECK-LABEL: @slsr1(		; CHECK-LABEL: @slsr1(
Show All 27 Lines

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

Show All 22 Lines
; Has no stack objects, but introduces them due to the CSR spill. We		; Has no stack objects, but introduces them due to the CSR spill. We
; see the FP modified in the callee with IPRA. We should not have		; see the FP modified in the callee with IPRA. We should not have
; redundant spills of s33 or assert.		; redundant spills of s33 or assert.
define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {		define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {
; CHECK-LABEL: csr_vgpr_spill_fp_callee:		; CHECK-LABEL: csr_vgpr_spill_fp_callee:
; CHECK: ; %bb.0: ; %bb		; CHECK: ; %bb.0: ; %bb
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1		; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; CHECK-NEXT: s_mov_b64 exec, s[4:5]		; CHECK-NEXT: s_mov_b64 exec, s[4:5]
; CHECK-NEXT: s_mov_b32 s6, s33		; CHECK-NEXT: s_mov_b32 s6, s33
; CHECK-NEXT: s_mov_b32 s33, s32		; CHECK-NEXT: s_mov_b32 s33, s32
; CHECK-NEXT: s_add_i32 s32, s32, 0x400		; CHECK-NEXT: s_add_i32 s32, s32, 0x400
; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; CHECK-NEXT: v_writelane_b32 v1, s30, 0		; CHECK-NEXT: ; implicit-def: $vgpr0
; CHECK-NEXT: v_writelane_b32 v1, s31, 1		; CHECK-NEXT: v_writelane_b32 v0, s30, 0
		; CHECK-NEXT: v_writelane_b32 v0, s31, 1
		; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; CHECK-NEXT: s_getpc_b64 s[4:5]		; CHECK-NEXT: s_getpc_b64 s[4:5]
; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4		; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4
; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12		; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12
; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]		; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]		; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]		; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]		; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]		; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
		; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; CHECK-NEXT: ;;#ASMSTART		; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; clobber csr v40		; CHECK-NEXT: ; clobber csr v40
; CHECK-NEXT: ;;#ASMEND		; CHECK-NEXT: ;;#ASMEND
; CHECK-NEXT: v_readlane_b32 s31, v1, 1		; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: v_readlane_b32 s30, v1, 0		; CHECK-NEXT: v_readlane_b32 s31, v0, 1
		; CHECK-NEXT: v_readlane_b32 s30, v0, 0
; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00		; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
; CHECK-NEXT: s_mov_b32 s33, s6		; CHECK-NEXT: s_mov_b32 s33, s6
; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1		; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; CHECK-NEXT: s_mov_b64 exec, s[4:5]		; CHECK-NEXT: s_mov_b64 exec, s[4:5]
; CHECK-NEXT: s_waitcnt vmcnt(0)		; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: s_setpc_b64 s[30:31]		; CHECK-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
call fastcc void @callee_has_fp()		call fastcc void @callee_has_fp()
call void asm sideeffect "; clobber csr v40", "~{v40}"()		call void asm sideeffect "; clobber csr v40", "~{v40}"()
ret void		ret void
}		}
Show All 21 Lines
}		}

; Same, except with a tail call.		; Same, except with a tail call.
define internal fastcc void @csr_vgpr_spill_fp_tailcall_callee() #0 {		define internal fastcc void @csr_vgpr_spill_fp_tailcall_callee() #0 {
; CHECK-LABEL: csr_vgpr_spill_fp_tailcall_callee:		; CHECK-LABEL: csr_vgpr_spill_fp_tailcall_callee:
; CHECK: ; %bb.0: ; %bb		; CHECK: ; %bb.0: ; %bb
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1		; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; CHECK-NEXT: s_mov_b64 exec, s[4:5]		; CHECK-NEXT: s_mov_b64 exec, s[4:5]
; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill		; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
; CHECK-NEXT: v_writelane_b32 v1, s33, 0		; CHECK-NEXT: ; implicit-def: $vgpr0
		; CHECK-NEXT: v_writelane_b32 v0, s33, 0
; CHECK-NEXT: ;;#ASMSTART		; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; clobber csr v40		; CHECK-NEXT: ; clobber csr v40
; CHECK-NEXT: ;;#ASMEND		; CHECK-NEXT: ;;#ASMEND
; CHECK-NEXT: s_getpc_b64 s[4:5]		; CHECK-NEXT: s_getpc_b64 s[4:5]
; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4		; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4
; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12		; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12
; CHECK-NEXT: v_readlane_b32 s33, v1, 0		; CHECK-NEXT: v_readlane_b32 s33, v0, 0
; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload		; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1		; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; CHECK-NEXT: s_mov_b64 exec, s[6:7]		; CHECK-NEXT: s_mov_b64 exec, s[6:7]
; CHECK-NEXT: s_setpc_b64 s[4:5]		; CHECK-NEXT: s_setpc_b64 s[4:5]
bb:		bb:
call void asm sideeffect "; clobber csr v40", "~{v40}"()		call void asm sideeffect "; clobber csr v40", "~{v40}"()
tail call fastcc void @callee_has_fp()		tail call fastcc void @callee_has_fp()
ret void		ret void
}		}

Show All 37 Lines
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1		; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill		; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
; CHECK-NEXT: s_mov_b64 exec, s[4:5]		; CHECK-NEXT: s_mov_b64 exec, s[4:5]
; CHECK-NEXT: s_mov_b32 s6, s33		; CHECK-NEXT: s_mov_b32 s6, s33
; CHECK-NEXT: s_mov_b32 s33, s32		; CHECK-NEXT: s_mov_b32 s33, s32
; CHECK-NEXT: s_add_i32 s32, s32, 0x400		; CHECK-NEXT: s_add_i32 s32, s32, 0x400
		; CHECK-NEXT: ; implicit-def: $vgpr1
; CHECK-NEXT: v_writelane_b32 v1, s30, 0		; CHECK-NEXT: v_writelane_b32 v1, s30, 0
; CHECK-NEXT: v_writelane_b32 v1, s31, 1		; CHECK-NEXT: v_writelane_b32 v1, s31, 1
; CHECK-NEXT: s_getpc_b64 s[4:5]		; CHECK-NEXT: s_getpc_b64 s[4:5]
; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4		; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4
; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12		; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12
; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]		; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]		; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]		; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
Show All 13 Lines	entry:
ret i32 %call		ret i32 %call
}		}

define hidden i32 @caller_save_vgpr_spill_fp() #0 {		define hidden i32 @caller_save_vgpr_spill_fp() #0 {
; CHECK-LABEL: caller_save_vgpr_spill_fp:		; CHECK-LABEL: caller_save_vgpr_spill_fp:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1		; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill		; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; CHECK-NEXT: s_mov_b64 exec, s[4:5]		; CHECK-NEXT: s_mov_b64 exec, s[4:5]
; CHECK-NEXT: s_mov_b32 s7, s33		; CHECK-NEXT: s_mov_b32 s7, s33
; CHECK-NEXT: s_mov_b32 s33, s32		; CHECK-NEXT: s_mov_b32 s33, s32
; CHECK-NEXT: s_add_i32 s32, s32, 0x400		; CHECK-NEXT: s_add_i32 s32, s32, 0x400
; CHECK-NEXT: v_writelane_b32 v2, s30, 0		; CHECK-NEXT: ; implicit-def: $vgpr0
; CHECK-NEXT: v_writelane_b32 v2, s31, 1		; CHECK-NEXT: v_writelane_b32 v0, s30, 0
		; CHECK-NEXT: v_writelane_b32 v0, s31, 1
		; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
		arsenmUnsubmitted Not Done Reply Inline Actions This is an unfortunate regression but what I expected arsenm: This is an unfortunate regression but what I expected
; CHECK-NEXT: s_getpc_b64 s[4:5]		; CHECK-NEXT: s_getpc_b64 s[4:5]
; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4		; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4
; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12		; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12
; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]		; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]		; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]		; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]		; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]		; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
; CHECK-NEXT: v_readlane_b32 s31, v2, 1		; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
; CHECK-NEXT: v_readlane_b32 s30, v2, 0		; CHECK-NEXT: s_waitcnt vmcnt(0)
		; CHECK-NEXT: v_readlane_b32 s31, v1, 1
		; CHECK-NEXT: v_readlane_b32 s30, v1, 0
; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00		; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
; CHECK-NEXT: s_mov_b32 s33, s7		; CHECK-NEXT: s_mov_b32 s33, s7
; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1		; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload		; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; CHECK-NEXT: s_mov_b64 exec, s[4:5]		; CHECK-NEXT: s_mov_b64 exec, s[4:5]
; CHECK-NEXT: s_waitcnt vmcnt(0)		; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: s_setpc_b64 s[30:31]		; CHECK-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%call = call i32 @caller_save_vgpr_spill_fp_tail_call()		%call = call i32 @caller_save_vgpr_spill_fp_tail_call()
ret i32 %call		ret i32 %call
}		}

Show All 24 Lines

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 -verify-machineinstrs < %s \| FileCheck %s

	; Test that source locations (.loc directives) are not added to the code within the prologue.			; Test that source locations (.loc directives) are not added to the code within the prologue.

	; Function Attrs: convergent mustprogress nounwind			; Function Attrs: convergent mustprogress nounwind
	define hidden void @_ZL3barv() #0 !dbg !1644 {			define hidden void @_ZL3barv() #0 !dbg !1644 {
	; CHECK-LABEL: _ZL3barv:			; CHECK-LABEL: _ZL3barv:
	; CHECK: .Lfunc_begin0:			; CHECK: .Lfunc_begin0:
	; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382			; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382
	; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0			; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0: ; %entry			; CHECK-NEXT: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v41, s33, 0			; CHECK-NEXT: v_writelane_b32 v40, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
				; CHECK-NEXT: ; implicit-def: $vgpr0
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3			; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v0, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v0, s31, 1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12
	; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: .Ltmp1:			; CHECK-NEXT: .Ltmp1:
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s31, v0, 1
				; CHECK-NEXT: v_readlane_b32 s30, v0, 0
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v41, 0			; CHECK-NEXT: v_readlane_b32 s33, v40, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	entry:			entry:
	call void @_ZL13sleep_foreverv(), !dbg !1646			call void @_ZL13sleep_foreverv(), !dbg !1646
	ret void, !dbg !1647			ret void, !dbg !1647
	}			}
	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

; FIXME: we should disable sdwa peephole because dead-code elimination, that		; FIXME: we should disable sdwa peephole because dead-code elimination, that
; runs after peephole, ruins this test (different register numbers)		; runs after peephole, ruins this test (different register numbers)

; Spill all SGPRs so multiple VGPRs are required for spilling all of them.		; Spill all SGPRs so multiple VGPRs are required for spilling all of them.

; Ideally we only need 2 VGPRs for all spilling. The VGPRs are		; Ideally we only need 2 VGPRs for all spilling. The VGPRs are
; allocated per-frame index, so it's possible to get up with more.		; allocated per-frame index, so it's possible to get up with more.
define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(i32 addrspace(1)* %out, i32 %in) #0 {		define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(i32 addrspace(1)* %out, i32 %in) #0 {
; GCN-LABEL: spill_sgprs_to_multiple_vgprs:		; GCN-LABEL: spill_sgprs_to_multiple_vgprs:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
		; GCN-NEXT: s_mov_b32 s92, SCRATCH_RSRC_DWORD0
		; GCN-NEXT: s_mov_b32 s93, SCRATCH_RSRC_DWORD1
		; GCN-NEXT: s_mov_b32 s94, -1
		; GCN-NEXT: s_mov_b32 s95, 0xe8f000
		; GCN-NEXT: s_add_u32 s92, s92, s3
		; GCN-NEXT: s_addc_u32 s93, s93, 0
; GCN-NEXT: s_load_dword s0, s[0:1], 0xb		; GCN-NEXT: s_load_dword s0, s[0:1], 0xb
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v0, s4, 0		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v0, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s11, 7
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_writelane_b32 v0, s4, 56		; GCN-NEXT: v_writelane_b32 v0, s4, 56
; GCN-NEXT: v_writelane_b32 v0, s5, 57		; GCN-NEXT: v_writelane_b32 v0, s5, 57
; GCN-NEXT: v_writelane_b32 v0, s6, 58		; GCN-NEXT: v_writelane_b32 v0, s6, 58
; GCN-NEXT: v_writelane_b32 v0, s7, 59		; GCN-NEXT: v_writelane_b32 v0, s7, 59
; GCN-NEXT: v_writelane_b32 v0, s8, 60		; GCN-NEXT: v_writelane_b32 v0, s8, 60
; GCN-NEXT: v_writelane_b32 v0, s9, 61		; GCN-NEXT: v_writelane_b32 v0, s9, 61
; GCN-NEXT: v_writelane_b32 v0, s10, 62		; GCN-NEXT: v_writelane_b32 v0, s10, 62
; GCN-NEXT: v_writelane_b32 v0, s11, 63		; GCN-NEXT: v_writelane_b32 v0, s11, 63
		; GCN-NEXT: buffer_store_dword v0, off, s[92:95], 0 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v1, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v1, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v1, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v1, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v1, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v1, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v1, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 8		; GCN-NEXT: v_writelane_b32 v0, s4, 8
; GCN-NEXT: v_writelane_b32 v1, s5, 9		; GCN-NEXT: v_writelane_b32 v0, s5, 9
; GCN-NEXT: v_writelane_b32 v1, s6, 10		; GCN-NEXT: v_writelane_b32 v0, s6, 10
; GCN-NEXT: v_writelane_b32 v1, s7, 11		; GCN-NEXT: v_writelane_b32 v0, s7, 11
; GCN-NEXT: v_writelane_b32 v1, s8, 12		; GCN-NEXT: v_writelane_b32 v0, s8, 12
; GCN-NEXT: v_writelane_b32 v1, s9, 13		; GCN-NEXT: v_writelane_b32 v0, s9, 13
; GCN-NEXT: v_writelane_b32 v1, s10, 14		; GCN-NEXT: v_writelane_b32 v0, s10, 14
; GCN-NEXT: v_writelane_b32 v1, s11, 15		; GCN-NEXT: v_writelane_b32 v0, s11, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 16		; GCN-NEXT: v_writelane_b32 v0, s4, 16
; GCN-NEXT: v_writelane_b32 v1, s5, 17		; GCN-NEXT: v_writelane_b32 v0, s5, 17
; GCN-NEXT: v_writelane_b32 v1, s6, 18		; GCN-NEXT: v_writelane_b32 v0, s6, 18
; GCN-NEXT: v_writelane_b32 v1, s7, 19		; GCN-NEXT: v_writelane_b32 v0, s7, 19
; GCN-NEXT: v_writelane_b32 v1, s8, 20		; GCN-NEXT: v_writelane_b32 v0, s8, 20
; GCN-NEXT: v_writelane_b32 v1, s9, 21		; GCN-NEXT: v_writelane_b32 v0, s9, 21
; GCN-NEXT: v_writelane_b32 v1, s10, 22		; GCN-NEXT: v_writelane_b32 v0, s10, 22
; GCN-NEXT: v_writelane_b32 v1, s11, 23		; GCN-NEXT: v_writelane_b32 v0, s11, 23
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 24		; GCN-NEXT: v_writelane_b32 v0, s4, 24
; GCN-NEXT: v_writelane_b32 v1, s5, 25		; GCN-NEXT: v_writelane_b32 v0, s5, 25
; GCN-NEXT: v_writelane_b32 v1, s6, 26		; GCN-NEXT: v_writelane_b32 v0, s6, 26
; GCN-NEXT: v_writelane_b32 v1, s7, 27		; GCN-NEXT: v_writelane_b32 v0, s7, 27
; GCN-NEXT: v_writelane_b32 v1, s8, 28		; GCN-NEXT: v_writelane_b32 v0, s8, 28
; GCN-NEXT: v_writelane_b32 v1, s9, 29		; GCN-NEXT: v_writelane_b32 v0, s9, 29
; GCN-NEXT: v_writelane_b32 v1, s10, 30		; GCN-NEXT: v_writelane_b32 v0, s10, 30
; GCN-NEXT: v_writelane_b32 v1, s11, 31		; GCN-NEXT: v_writelane_b32 v0, s11, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 32		; GCN-NEXT: v_writelane_b32 v0, s4, 32
; GCN-NEXT: v_writelane_b32 v1, s5, 33		; GCN-NEXT: v_writelane_b32 v0, s5, 33
; GCN-NEXT: v_writelane_b32 v1, s6, 34		; GCN-NEXT: v_writelane_b32 v0, s6, 34
; GCN-NEXT: v_writelane_b32 v1, s7, 35		; GCN-NEXT: v_writelane_b32 v0, s7, 35
; GCN-NEXT: v_writelane_b32 v1, s8, 36		; GCN-NEXT: v_writelane_b32 v0, s8, 36
; GCN-NEXT: v_writelane_b32 v1, s9, 37		; GCN-NEXT: v_writelane_b32 v0, s9, 37
; GCN-NEXT: v_writelane_b32 v1, s10, 38		; GCN-NEXT: v_writelane_b32 v0, s10, 38
; GCN-NEXT: v_writelane_b32 v1, s11, 39		; GCN-NEXT: v_writelane_b32 v0, s11, 39
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 40		; GCN-NEXT: v_writelane_b32 v0, s4, 40
; GCN-NEXT: v_writelane_b32 v1, s5, 41		; GCN-NEXT: v_writelane_b32 v0, s5, 41
; GCN-NEXT: v_writelane_b32 v1, s6, 42		; GCN-NEXT: v_writelane_b32 v0, s6, 42
; GCN-NEXT: v_writelane_b32 v1, s7, 43		; GCN-NEXT: v_writelane_b32 v0, s7, 43
; GCN-NEXT: v_writelane_b32 v1, s8, 44		; GCN-NEXT: v_writelane_b32 v0, s8, 44
; GCN-NEXT: v_writelane_b32 v1, s9, 45		; GCN-NEXT: v_writelane_b32 v0, s9, 45
; GCN-NEXT: v_writelane_b32 v1, s10, 46		; GCN-NEXT: v_writelane_b32 v0, s10, 46
; GCN-NEXT: v_writelane_b32 v1, s11, 47		; GCN-NEXT: v_writelane_b32 v0, s11, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 48		; GCN-NEXT: v_writelane_b32 v0, s4, 48
; GCN-NEXT: v_writelane_b32 v1, s5, 49		; GCN-NEXT: v_writelane_b32 v0, s5, 49
; GCN-NEXT: v_writelane_b32 v1, s6, 50		; GCN-NEXT: v_writelane_b32 v0, s6, 50
; GCN-NEXT: v_writelane_b32 v1, s7, 51		; GCN-NEXT: v_writelane_b32 v0, s7, 51
; GCN-NEXT: v_writelane_b32 v1, s8, 52		; GCN-NEXT: v_writelane_b32 v0, s8, 52
; GCN-NEXT: v_writelane_b32 v1, s9, 53		; GCN-NEXT: v_writelane_b32 v0, s9, 53
; GCN-NEXT: v_writelane_b32 v1, s10, 54		; GCN-NEXT: v_writelane_b32 v0, s10, 54
; GCN-NEXT: v_writelane_b32 v1, s11, 55		; GCN-NEXT: v_writelane_b32 v0, s11, 55
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 56		; GCN-NEXT: v_writelane_b32 v0, s4, 56
; GCN-NEXT: v_writelane_b32 v1, s5, 57		; GCN-NEXT: v_writelane_b32 v0, s5, 57
; GCN-NEXT: v_writelane_b32 v1, s6, 58		; GCN-NEXT: v_writelane_b32 v0, s6, 58
; GCN-NEXT: v_writelane_b32 v1, s7, 59		; GCN-NEXT: v_writelane_b32 v0, s7, 59
; GCN-NEXT: v_writelane_b32 v1, s8, 60		; GCN-NEXT: v_writelane_b32 v0, s8, 60
; GCN-NEXT: v_writelane_b32 v1, s9, 61		; GCN-NEXT: v_writelane_b32 v0, s9, 61
; GCN-NEXT: v_writelane_b32 v1, s10, 62		; GCN-NEXT: v_writelane_b32 v0, s10, 62
; GCN-NEXT: v_writelane_b32 v1, s11, 63		; GCN-NEXT: v_writelane_b32 v0, s11, 63
		; GCN-NEXT: buffer_store_dword v0, off, s[92:95], 0 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v2, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v2, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v2, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v2, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v2, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v2, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v2, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v2, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
		; GCN-NEXT: v_writelane_b32 v0, s11, 7
		; GCN-NEXT: buffer_store_dword v0, off, s[92:95], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB0_2		; GCN-NEXT: s_cbranch_scc1 .LBB0_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s8, v1, 56		; GCN-NEXT: buffer_load_dword v0, off, s[92:95], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s9, v1, 57		; GCN-NEXT: buffer_load_dword v1, off, s[92:95], 0 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s10, v1, 58		; GCN-NEXT: buffer_load_dword v2, off, s[92:95], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s11, v1, 59		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s12, v1, 60		; GCN-NEXT: v_readlane_b32 s8, v2, 56
; GCN-NEXT: v_readlane_b32 s13, v1, 61		; GCN-NEXT: v_readlane_b32 s9, v2, 57
; GCN-NEXT: v_readlane_b32 s14, v1, 62		; GCN-NEXT: v_readlane_b32 s10, v2, 58
; GCN-NEXT: v_readlane_b32 s15, v1, 63		; GCN-NEXT: v_readlane_b32 s11, v2, 59
; GCN-NEXT: v_readlane_b32 s16, v1, 48		; GCN-NEXT: v_readlane_b32 s12, v2, 60
; GCN-NEXT: v_readlane_b32 s17, v1, 49		; GCN-NEXT: v_readlane_b32 s13, v2, 61
; GCN-NEXT: v_readlane_b32 s18, v1, 50		; GCN-NEXT: v_readlane_b32 s14, v2, 62
; GCN-NEXT: v_readlane_b32 s19, v1, 51		; GCN-NEXT: v_readlane_b32 s15, v2, 63
; GCN-NEXT: v_readlane_b32 s20, v1, 52		; GCN-NEXT: v_readlane_b32 s16, v2, 48
; GCN-NEXT: v_readlane_b32 s21, v1, 53		; GCN-NEXT: v_readlane_b32 s17, v2, 49
; GCN-NEXT: v_readlane_b32 s22, v1, 54		; GCN-NEXT: v_readlane_b32 s18, v2, 50
; GCN-NEXT: v_readlane_b32 s23, v1, 55		; GCN-NEXT: v_readlane_b32 s19, v2, 51
; GCN-NEXT: v_readlane_b32 s24, v1, 40		; GCN-NEXT: v_readlane_b32 s20, v2, 52
; GCN-NEXT: v_readlane_b32 s25, v1, 41		; GCN-NEXT: v_readlane_b32 s21, v2, 53
; GCN-NEXT: v_readlane_b32 s26, v1, 42		; GCN-NEXT: v_readlane_b32 s22, v2, 54
; GCN-NEXT: v_readlane_b32 s27, v1, 43		; GCN-NEXT: v_readlane_b32 s23, v2, 55
; GCN-NEXT: v_readlane_b32 s28, v1, 44		; GCN-NEXT: v_readlane_b32 s24, v2, 40
; GCN-NEXT: v_readlane_b32 s29, v1, 45		; GCN-NEXT: v_readlane_b32 s25, v2, 41
; GCN-NEXT: v_readlane_b32 s30, v1, 46		; GCN-NEXT: v_readlane_b32 s26, v2, 42
; GCN-NEXT: v_readlane_b32 s31, v1, 47		; GCN-NEXT: v_readlane_b32 s27, v2, 43
; GCN-NEXT: v_readlane_b32 s36, v1, 32		; GCN-NEXT: v_readlane_b32 s28, v2, 44
; GCN-NEXT: v_readlane_b32 s37, v1, 33		; GCN-NEXT: v_readlane_b32 s29, v2, 45
; GCN-NEXT: v_readlane_b32 s38, v1, 34		; GCN-NEXT: v_readlane_b32 s30, v2, 46
; GCN-NEXT: v_readlane_b32 s39, v1, 35		; GCN-NEXT: v_readlane_b32 s31, v2, 47
; GCN-NEXT: v_readlane_b32 s40, v1, 36		; GCN-NEXT: v_readlane_b32 s36, v2, 32
; GCN-NEXT: v_readlane_b32 s41, v1, 37		; GCN-NEXT: v_readlane_b32 s37, v2, 33
; GCN-NEXT: v_readlane_b32 s42, v1, 38		; GCN-NEXT: v_readlane_b32 s38, v2, 34
; GCN-NEXT: v_readlane_b32 s43, v1, 39		; GCN-NEXT: v_readlane_b32 s39, v2, 35
; GCN-NEXT: v_readlane_b32 s44, v1, 24		; GCN-NEXT: v_readlane_b32 s40, v2, 36
; GCN-NEXT: v_readlane_b32 s45, v1, 25		; GCN-NEXT: v_readlane_b32 s41, v2, 37
; GCN-NEXT: v_readlane_b32 s46, v1, 26		; GCN-NEXT: v_readlane_b32 s42, v2, 38
; GCN-NEXT: v_readlane_b32 s47, v1, 27		; GCN-NEXT: v_readlane_b32 s43, v2, 39
; GCN-NEXT: v_readlane_b32 s48, v1, 28		; GCN-NEXT: v_readlane_b32 s44, v2, 24
; GCN-NEXT: v_readlane_b32 s49, v1, 29		; GCN-NEXT: v_readlane_b32 s45, v2, 25
; GCN-NEXT: v_readlane_b32 s50, v1, 30		; GCN-NEXT: v_readlane_b32 s46, v2, 26
; GCN-NEXT: v_readlane_b32 s51, v1, 31		; GCN-NEXT: v_readlane_b32 s47, v2, 27
; GCN-NEXT: v_readlane_b32 s52, v1, 16		; GCN-NEXT: v_readlane_b32 s48, v2, 28
; GCN-NEXT: v_readlane_b32 s53, v1, 17		; GCN-NEXT: v_readlane_b32 s49, v2, 29
; GCN-NEXT: v_readlane_b32 s54, v1, 18		; GCN-NEXT: v_readlane_b32 s50, v2, 30
; GCN-NEXT: v_readlane_b32 s55, v1, 19		; GCN-NEXT: v_readlane_b32 s51, v2, 31
; GCN-NEXT: v_readlane_b32 s56, v1, 20		; GCN-NEXT: v_readlane_b32 s52, v2, 16
; GCN-NEXT: v_readlane_b32 s57, v1, 21		; GCN-NEXT: v_readlane_b32 s53, v2, 17
; GCN-NEXT: v_readlane_b32 s58, v1, 22		; GCN-NEXT: v_readlane_b32 s54, v2, 18
; GCN-NEXT: v_readlane_b32 s59, v1, 23		; GCN-NEXT: v_readlane_b32 s55, v2, 19
; GCN-NEXT: v_readlane_b32 s60, v1, 8		; GCN-NEXT: v_readlane_b32 s56, v2, 20
; GCN-NEXT: v_readlane_b32 s61, v1, 9		; GCN-NEXT: v_readlane_b32 s57, v2, 21
; GCN-NEXT: v_readlane_b32 s62, v1, 10		; GCN-NEXT: v_readlane_b32 s58, v2, 22
; GCN-NEXT: v_readlane_b32 s63, v1, 11		; GCN-NEXT: v_readlane_b32 s59, v2, 23
; GCN-NEXT: v_readlane_b32 s64, v1, 12		; GCN-NEXT: v_readlane_b32 s60, v2, 8
; GCN-NEXT: v_readlane_b32 s65, v1, 13		; GCN-NEXT: v_readlane_b32 s61, v2, 9
; GCN-NEXT: v_readlane_b32 s66, v1, 14		; GCN-NEXT: v_readlane_b32 s62, v2, 10
; GCN-NEXT: v_readlane_b32 s67, v1, 15		; GCN-NEXT: v_readlane_b32 s63, v2, 11
; GCN-NEXT: v_readlane_b32 s68, v1, 0		; GCN-NEXT: v_readlane_b32 s64, v2, 12
; GCN-NEXT: v_readlane_b32 s69, v1, 1		; GCN-NEXT: v_readlane_b32 s65, v2, 13
; GCN-NEXT: v_readlane_b32 s70, v1, 2		; GCN-NEXT: v_readlane_b32 s66, v2, 14
; GCN-NEXT: v_readlane_b32 s71, v1, 3		; GCN-NEXT: v_readlane_b32 s67, v2, 15
; GCN-NEXT: v_readlane_b32 s72, v1, 4		; GCN-NEXT: v_readlane_b32 s68, v2, 0
; GCN-NEXT: v_readlane_b32 s73, v1, 5		; GCN-NEXT: v_readlane_b32 s69, v2, 1
; GCN-NEXT: v_readlane_b32 s74, v1, 6		; GCN-NEXT: v_readlane_b32 s70, v2, 2
; GCN-NEXT: v_readlane_b32 s75, v1, 7		; GCN-NEXT: v_readlane_b32 s71, v2, 3
; GCN-NEXT: v_readlane_b32 s76, v0, 56		; GCN-NEXT: v_readlane_b32 s72, v2, 4
; GCN-NEXT: v_readlane_b32 s77, v0, 57		; GCN-NEXT: v_readlane_b32 s73, v2, 5
; GCN-NEXT: v_readlane_b32 s78, v0, 58		; GCN-NEXT: v_readlane_b32 s74, v2, 6
; GCN-NEXT: v_readlane_b32 s79, v0, 59		; GCN-NEXT: v_readlane_b32 s75, v2, 7
; GCN-NEXT: v_readlane_b32 s80, v0, 60		; GCN-NEXT: v_readlane_b32 s76, v1, 56
; GCN-NEXT: v_readlane_b32 s81, v0, 61		; GCN-NEXT: v_readlane_b32 s77, v1, 57
; GCN-NEXT: v_readlane_b32 s82, v0, 62		; GCN-NEXT: v_readlane_b32 s78, v1, 58
; GCN-NEXT: v_readlane_b32 s83, v0, 63		; GCN-NEXT: v_readlane_b32 s79, v1, 59
; GCN-NEXT: v_readlane_b32 s84, v0, 48		; GCN-NEXT: v_readlane_b32 s80, v1, 60
; GCN-NEXT: v_readlane_b32 s85, v0, 49		; GCN-NEXT: v_readlane_b32 s81, v1, 61
; GCN-NEXT: v_readlane_b32 s86, v0, 50		; GCN-NEXT: v_readlane_b32 s82, v1, 62
; GCN-NEXT: v_readlane_b32 s87, v0, 51		; GCN-NEXT: v_readlane_b32 s83, v1, 63
; GCN-NEXT: v_readlane_b32 s88, v0, 52		; GCN-NEXT: v_readlane_b32 s84, v1, 48
; GCN-NEXT: v_readlane_b32 s89, v0, 53		; GCN-NEXT: v_readlane_b32 s85, v1, 49
; GCN-NEXT: v_readlane_b32 s90, v0, 54		; GCN-NEXT: v_readlane_b32 s86, v1, 50
; GCN-NEXT: v_readlane_b32 s91, v0, 55		; GCN-NEXT: v_readlane_b32 s87, v1, 51
; GCN-NEXT: v_readlane_b32 s0, v0, 0		; GCN-NEXT: v_readlane_b32 s88, v1, 52
; GCN-NEXT: v_readlane_b32 s1, v0, 1		; GCN-NEXT: v_readlane_b32 s89, v1, 53
; GCN-NEXT: v_readlane_b32 s2, v0, 2		; GCN-NEXT: v_readlane_b32 s90, v1, 54
; GCN-NEXT: v_readlane_b32 s3, v0, 3		; GCN-NEXT: v_readlane_b32 s91, v1, 55
; GCN-NEXT: v_readlane_b32 s4, v0, 4		; GCN-NEXT: v_readlane_b32 s0, v1, 0
; GCN-NEXT: v_readlane_b32 s5, v0, 5		; GCN-NEXT: v_readlane_b32 s1, v1, 1
; GCN-NEXT: v_readlane_b32 s6, v0, 6		; GCN-NEXT: v_readlane_b32 s2, v1, 2
; GCN-NEXT: v_readlane_b32 s7, v0, 7		; GCN-NEXT: v_readlane_b32 s3, v1, 3
		; GCN-NEXT: v_readlane_b32 s4, v1, 4
		; GCN-NEXT: v_readlane_b32 s5, v1, 5
		; GCN-NEXT: v_readlane_b32 s6, v1, 6
		; GCN-NEXT: v_readlane_b32 s7, v1, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 8		; GCN-NEXT: v_readlane_b32 s0, v1, 8
; GCN-NEXT: v_readlane_b32 s1, v0, 9		; GCN-NEXT: v_readlane_b32 s1, v1, 9
; GCN-NEXT: v_readlane_b32 s2, v0, 10		; GCN-NEXT: v_readlane_b32 s2, v1, 10
; GCN-NEXT: v_readlane_b32 s3, v0, 11		; GCN-NEXT: v_readlane_b32 s3, v1, 11
; GCN-NEXT: v_readlane_b32 s4, v0, 12		; GCN-NEXT: v_readlane_b32 s4, v1, 12
; GCN-NEXT: v_readlane_b32 s5, v0, 13		; GCN-NEXT: v_readlane_b32 s5, v1, 13
; GCN-NEXT: v_readlane_b32 s6, v0, 14		; GCN-NEXT: v_readlane_b32 s6, v1, 14
; GCN-NEXT: v_readlane_b32 s7, v0, 15		; GCN-NEXT: v_readlane_b32 s7, v1, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 16		; GCN-NEXT: v_readlane_b32 s0, v1, 16
; GCN-NEXT: v_readlane_b32 s1, v0, 17		; GCN-NEXT: v_readlane_b32 s1, v1, 17
; GCN-NEXT: v_readlane_b32 s2, v0, 18		; GCN-NEXT: v_readlane_b32 s2, v1, 18
; GCN-NEXT: v_readlane_b32 s3, v0, 19		; GCN-NEXT: v_readlane_b32 s3, v1, 19
; GCN-NEXT: v_readlane_b32 s4, v0, 20		; GCN-NEXT: v_readlane_b32 s4, v1, 20
; GCN-NEXT: v_readlane_b32 s5, v0, 21		; GCN-NEXT: v_readlane_b32 s5, v1, 21
; GCN-NEXT: v_readlane_b32 s6, v0, 22		; GCN-NEXT: v_readlane_b32 s6, v1, 22
; GCN-NEXT: v_readlane_b32 s7, v0, 23		; GCN-NEXT: v_readlane_b32 s7, v1, 23
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 24		; GCN-NEXT: v_readlane_b32 s0, v1, 24
; GCN-NEXT: v_readlane_b32 s1, v0, 25		; GCN-NEXT: v_readlane_b32 s1, v1, 25
; GCN-NEXT: v_readlane_b32 s2, v0, 26		; GCN-NEXT: v_readlane_b32 s2, v1, 26
; GCN-NEXT: v_readlane_b32 s3, v0, 27		; GCN-NEXT: v_readlane_b32 s3, v1, 27
; GCN-NEXT: v_readlane_b32 s4, v0, 28		; GCN-NEXT: v_readlane_b32 s4, v1, 28
; GCN-NEXT: v_readlane_b32 s5, v0, 29		; GCN-NEXT: v_readlane_b32 s5, v1, 29
; GCN-NEXT: v_readlane_b32 s6, v0, 30		; GCN-NEXT: v_readlane_b32 s6, v1, 30
; GCN-NEXT: v_readlane_b32 s7, v0, 31		; GCN-NEXT: v_readlane_b32 s7, v1, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 32		; GCN-NEXT: v_readlane_b32 s0, v1, 32
; GCN-NEXT: v_readlane_b32 s1, v0, 33		; GCN-NEXT: v_readlane_b32 s1, v1, 33
; GCN-NEXT: v_readlane_b32 s2, v0, 34		; GCN-NEXT: v_readlane_b32 s2, v1, 34
; GCN-NEXT: v_readlane_b32 s3, v0, 35		; GCN-NEXT: v_readlane_b32 s3, v1, 35
; GCN-NEXT: v_readlane_b32 s4, v0, 36		; GCN-NEXT: v_readlane_b32 s4, v1, 36
; GCN-NEXT: v_readlane_b32 s5, v0, 37		; GCN-NEXT: v_readlane_b32 s5, v1, 37
; GCN-NEXT: v_readlane_b32 s6, v0, 38		; GCN-NEXT: v_readlane_b32 s6, v1, 38
; GCN-NEXT: v_readlane_b32 s7, v0, 39		; GCN-NEXT: v_readlane_b32 s7, v1, 39
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 40		; GCN-NEXT: v_readlane_b32 s0, v1, 40
; GCN-NEXT: v_readlane_b32 s1, v0, 41		; GCN-NEXT: v_readlane_b32 s1, v1, 41
; GCN-NEXT: v_readlane_b32 s2, v0, 42		; GCN-NEXT: v_readlane_b32 s2, v1, 42
; GCN-NEXT: v_readlane_b32 s3, v0, 43		; GCN-NEXT: v_readlane_b32 s3, v1, 43
; GCN-NEXT: v_readlane_b32 s4, v0, 44		; GCN-NEXT: v_readlane_b32 s4, v1, 44
; GCN-NEXT: v_readlane_b32 s5, v0, 45		; GCN-NEXT: v_readlane_b32 s5, v1, 45
; GCN-NEXT: v_readlane_b32 s6, v0, 46		; GCN-NEXT: v_readlane_b32 s6, v1, 46
; GCN-NEXT: v_readlane_b32 s7, v0, 47		; GCN-NEXT: v_readlane_b32 s7, v1, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v2, 0		; GCN-NEXT: v_readlane_b32 s0, v0, 0
; GCN-NEXT: v_readlane_b32 s1, v2, 1		; GCN-NEXT: v_readlane_b32 s1, v0, 1
; GCN-NEXT: v_readlane_b32 s2, v2, 2		; GCN-NEXT: v_readlane_b32 s2, v0, 2
; GCN-NEXT: v_readlane_b32 s3, v2, 3		; GCN-NEXT: v_readlane_b32 s3, v0, 3
; GCN-NEXT: v_readlane_b32 s4, v2, 4		; GCN-NEXT: v_readlane_b32 s4, v0, 4
; GCN-NEXT: v_readlane_b32 s5, v2, 5		; GCN-NEXT: v_readlane_b32 s5, v0, 5
; GCN-NEXT: v_readlane_b32 s6, v2, 6		; GCN-NEXT: v_readlane_b32 s6, v0, 6
; GCN-NEXT: v_readlane_b32 s7, v2, 7		; GCN-NEXT: v_readlane_b32 s7, v0, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[84:91]		; GCN-NEXT: ; use s[84:91]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[76:83]		; GCN-NEXT: ; use s[76:83]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[68:75]		; GCN-NEXT: ; use s[68:75]
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	ret:
ret void		ret void
}		}

; Some of the lanes of an SGPR spill are in one VGPR and some forced		; Some of the lanes of an SGPR spill are in one VGPR and some forced
; into the next available VGPR.		; into the next available VGPR.
define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {		define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {
; GCN-LABEL: split_sgpr_spill_2_vgprs:		; GCN-LABEL: split_sgpr_spill_2_vgprs:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
		; GCN-NEXT: s_mov_b32 s28, SCRATCH_RSRC_DWORD0
		; GCN-NEXT: s_mov_b32 s29, SCRATCH_RSRC_DWORD1
		; GCN-NEXT: s_mov_b32 s30, -1
		; GCN-NEXT: s_mov_b32 s31, 0xe8f000
		; GCN-NEXT: s_add_u32 s28, s28, s3
		; GCN-NEXT: s_addc_u32 s29, s29, 0
; GCN-NEXT: s_load_dword s0, s[0:1], 0xb		; GCN-NEXT: s_load_dword s0, s[0:1], 0xb
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v0, s4, 0		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v0, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s11, 7
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_writelane_b32 v0, s12, 56		; GCN-NEXT: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v0, s13, 57		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v0, s14, 58		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v0, s15, 59		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v0, s16, 60		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v0, s17, 61		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v0, s18, 62		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v0, s19, 63		; GCN-NEXT: v_writelane_b32 v0, s19, 63
		; GCN-NEXT: buffer_store_dword v0, off, s[28:31], 0 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v1, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v1, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v1, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v1, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v1, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v1, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v1, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s2, 8		; GCN-NEXT: v_writelane_b32 v0, s2, 8
; GCN-NEXT: v_writelane_b32 v1, s3, 9		; GCN-NEXT: v_writelane_b32 v0, s3, 9
		; GCN-NEXT: buffer_store_dword v0, off, s[28:31], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB1_2		; GCN-NEXT: s_cbranch_scc1 .LBB1_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
		; GCN-NEXT: buffer_load_dword v0, off, s[28:31], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: buffer_load_dword v1, off, s[28:31], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s16, v1, 8		; GCN-NEXT: v_readlane_b32 s16, v1, 8
; GCN-NEXT: v_readlane_b32 s17, v1, 9		; GCN-NEXT: v_readlane_b32 s17, v1, 9
; GCN-NEXT: v_readlane_b32 s20, v1, 0		; GCN-NEXT: v_readlane_b32 s20, v1, 0
; GCN-NEXT: v_readlane_b32 s21, v1, 1		; GCN-NEXT: v_readlane_b32 s21, v1, 1
; GCN-NEXT: v_readlane_b32 s22, v1, 2		; GCN-NEXT: v_readlane_b32 s22, v1, 2
; GCN-NEXT: v_readlane_b32 s23, v1, 3		; GCN-NEXT: v_readlane_b32 s23, v1, 3
; GCN-NEXT: v_readlane_b32 s24, v1, 4		; GCN-NEXT: v_readlane_b32 s24, v1, 4
; GCN-NEXT: v_readlane_b32 s25, v1, 5		; GCN-NEXT: v_readlane_b32 s25, v1, 5
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v31, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v31, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v31, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v31, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v31, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v31, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v31, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v31, s12, 8		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: v_writelane_b32 v31, s13, 9		; GCN-NEXT: v_writelane_b32 v0, s12, 8
; GCN-NEXT: v_writelane_b32 v31, s14, 10		; GCN-NEXT: v_writelane_b32 v0, s13, 9
; GCN-NEXT: v_writelane_b32 v31, s15, 11		; GCN-NEXT: v_writelane_b32 v0, s14, 10
; GCN-NEXT: v_writelane_b32 v31, s16, 12		; GCN-NEXT: v_writelane_b32 v0, s15, 11
; GCN-NEXT: v_writelane_b32 v31, s17, 13		; GCN-NEXT: v_writelane_b32 v0, s16, 12
; GCN-NEXT: v_writelane_b32 v31, s18, 14		; GCN-NEXT: v_writelane_b32 v0, s17, 13
; GCN-NEXT: v_writelane_b32 v31, s19, 15		; GCN-NEXT: v_writelane_b32 v0, s18, 14
		; GCN-NEXT: v_writelane_b32 v0, s19, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 16		; GCN-NEXT: v_writelane_b32 v0, s4, 16
; GCN-NEXT: v_writelane_b32 v31, s5, 17		; GCN-NEXT: v_writelane_b32 v0, s5, 17
; GCN-NEXT: v_writelane_b32 v31, s6, 18		; GCN-NEXT: v_writelane_b32 v0, s6, 18
; GCN-NEXT: v_writelane_b32 v31, s7, 19		; GCN-NEXT: v_writelane_b32 v0, s7, 19
; GCN-NEXT: v_writelane_b32 v31, s8, 20		; GCN-NEXT: v_writelane_b32 v0, s8, 20
; GCN-NEXT: v_writelane_b32 v31, s9, 21		; GCN-NEXT: v_writelane_b32 v0, s9, 21
; GCN-NEXT: v_writelane_b32 v31, s10, 22		; GCN-NEXT: v_writelane_b32 v0, s10, 22
; GCN-NEXT: v_writelane_b32 v31, s11, 23		; GCN-NEXT: v_writelane_b32 v0, s11, 23
; GCN-NEXT: v_writelane_b32 v31, s12, 24		; GCN-NEXT: v_writelane_b32 v0, s12, 24
; GCN-NEXT: v_writelane_b32 v31, s13, 25		; GCN-NEXT: v_writelane_b32 v0, s13, 25
; GCN-NEXT: v_writelane_b32 v31, s14, 26		; GCN-NEXT: v_writelane_b32 v0, s14, 26
; GCN-NEXT: v_writelane_b32 v31, s15, 27		; GCN-NEXT: v_writelane_b32 v0, s15, 27
; GCN-NEXT: v_writelane_b32 v31, s16, 28		; GCN-NEXT: v_writelane_b32 v0, s16, 28
; GCN-NEXT: v_writelane_b32 v31, s17, 29		; GCN-NEXT: v_writelane_b32 v0, s17, 29
; GCN-NEXT: v_writelane_b32 v31, s18, 30		; GCN-NEXT: v_writelane_b32 v0, s18, 30
; GCN-NEXT: v_writelane_b32 v31, s19, 31		; GCN-NEXT: v_writelane_b32 v0, s19, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 32		; GCN-NEXT: v_writelane_b32 v0, s4, 32
; GCN-NEXT: v_writelane_b32 v31, s5, 33		; GCN-NEXT: v_writelane_b32 v0, s5, 33
; GCN-NEXT: v_writelane_b32 v31, s6, 34		; GCN-NEXT: v_writelane_b32 v0, s6, 34
; GCN-NEXT: v_writelane_b32 v31, s7, 35		; GCN-NEXT: v_writelane_b32 v0, s7, 35
; GCN-NEXT: v_writelane_b32 v31, s8, 36		; GCN-NEXT: v_writelane_b32 v0, s8, 36
; GCN-NEXT: v_writelane_b32 v31, s9, 37		; GCN-NEXT: v_writelane_b32 v0, s9, 37
; GCN-NEXT: v_writelane_b32 v31, s10, 38		; GCN-NEXT: v_writelane_b32 v0, s10, 38
; GCN-NEXT: v_writelane_b32 v31, s11, 39		; GCN-NEXT: v_writelane_b32 v0, s11, 39
; GCN-NEXT: v_writelane_b32 v31, s12, 40		; GCN-NEXT: v_writelane_b32 v0, s12, 40
; GCN-NEXT: v_writelane_b32 v31, s13, 41		; GCN-NEXT: v_writelane_b32 v0, s13, 41
; GCN-NEXT: v_writelane_b32 v31, s14, 42		; GCN-NEXT: v_writelane_b32 v0, s14, 42
; GCN-NEXT: v_writelane_b32 v31, s15, 43		; GCN-NEXT: v_writelane_b32 v0, s15, 43
; GCN-NEXT: v_writelane_b32 v31, s16, 44		; GCN-NEXT: v_writelane_b32 v0, s16, 44
; GCN-NEXT: v_writelane_b32 v31, s17, 45		; GCN-NEXT: v_writelane_b32 v0, s17, 45
; GCN-NEXT: v_writelane_b32 v31, s18, 46		; GCN-NEXT: v_writelane_b32 v0, s18, 46
; GCN-NEXT: v_writelane_b32 v31, s19, 47		; GCN-NEXT: v_writelane_b32 v0, s19, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 48		; GCN-NEXT: v_writelane_b32 v0, s4, 48
; GCN-NEXT: v_writelane_b32 v31, s5, 49		; GCN-NEXT: v_writelane_b32 v0, s5, 49
; GCN-NEXT: v_writelane_b32 v31, s6, 50		; GCN-NEXT: v_writelane_b32 v0, s6, 50
; GCN-NEXT: v_writelane_b32 v31, s7, 51		; GCN-NEXT: v_writelane_b32 v0, s7, 51
; GCN-NEXT: v_writelane_b32 v31, s8, 52		; GCN-NEXT: v_writelane_b32 v0, s8, 52
; GCN-NEXT: v_writelane_b32 v31, s9, 53		; GCN-NEXT: v_writelane_b32 v0, s9, 53
; GCN-NEXT: v_writelane_b32 v31, s10, 54		; GCN-NEXT: v_writelane_b32 v0, s10, 54
; GCN-NEXT: v_writelane_b32 v31, s11, 55		; GCN-NEXT: v_writelane_b32 v0, s11, 55
; GCN-NEXT: v_writelane_b32 v31, s12, 56		; GCN-NEXT: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v31, s13, 57		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v31, s14, 58		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v31, s15, 59		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v31, s16, 60		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v31, s17, 61		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v31, s18, 62		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v31, s19, 63		; GCN-NEXT: v_writelane_b32 v0, s19, 63
		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: v_writelane_b32 v0, s2, 0		; GCN-NEXT: v_writelane_b32 v0, s2, 0
; GCN-NEXT: v_writelane_b32 v0, s3, 1		; GCN-NEXT: v_writelane_b32 v0, s3, 1
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB2_2		; GCN-NEXT: s_cbranch_scc1 .LBB2_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s36, v31, 32		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s37, v31, 33		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s38, v31, 34		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s39, v31, 35		; GCN-NEXT: v_readlane_b32 s36, v1, 32
; GCN-NEXT: v_readlane_b32 s40, v31, 36		; GCN-NEXT: v_readlane_b32 s37, v1, 33
; GCN-NEXT: v_readlane_b32 s41, v31, 37		; GCN-NEXT: v_readlane_b32 s38, v1, 34
; GCN-NEXT: v_readlane_b32 s42, v31, 38		; GCN-NEXT: v_readlane_b32 s39, v1, 35
; GCN-NEXT: v_readlane_b32 s43, v31, 39		; GCN-NEXT: v_readlane_b32 s40, v1, 36
; GCN-NEXT: v_readlane_b32 s44, v31, 40		; GCN-NEXT: v_readlane_b32 s41, v1, 37
; GCN-NEXT: v_readlane_b32 s45, v31, 41		; GCN-NEXT: v_readlane_b32 s42, v1, 38
; GCN-NEXT: v_readlane_b32 s46, v31, 42		; GCN-NEXT: v_readlane_b32 s43, v1, 39
; GCN-NEXT: v_readlane_b32 s47, v31, 43		; GCN-NEXT: v_readlane_b32 s44, v1, 40
; GCN-NEXT: v_readlane_b32 s48, v31, 44		; GCN-NEXT: v_readlane_b32 s45, v1, 41
; GCN-NEXT: v_readlane_b32 s49, v31, 45		; GCN-NEXT: v_readlane_b32 s46, v1, 42
; GCN-NEXT: v_readlane_b32 s50, v31, 46		; GCN-NEXT: v_readlane_b32 s47, v1, 43
; GCN-NEXT: v_readlane_b32 s51, v31, 47		; GCN-NEXT: v_readlane_b32 s48, v1, 44
; GCN-NEXT: v_readlane_b32 s0, v31, 16		; GCN-NEXT: v_readlane_b32 s49, v1, 45
; GCN-NEXT: v_readlane_b32 s1, v31, 17		; GCN-NEXT: v_readlane_b32 s50, v1, 46
; GCN-NEXT: v_readlane_b32 s2, v31, 18		; GCN-NEXT: v_readlane_b32 s51, v1, 47
; GCN-NEXT: v_readlane_b32 s3, v31, 19		; GCN-NEXT: v_readlane_b32 s0, v1, 16
; GCN-NEXT: v_readlane_b32 s4, v31, 20		; GCN-NEXT: v_readlane_b32 s1, v1, 17
; GCN-NEXT: v_readlane_b32 s5, v31, 21		; GCN-NEXT: v_readlane_b32 s2, v1, 18
; GCN-NEXT: v_readlane_b32 s6, v31, 22		; GCN-NEXT: v_readlane_b32 s3, v1, 19
; GCN-NEXT: v_readlane_b32 s7, v31, 23		; GCN-NEXT: v_readlane_b32 s4, v1, 20
; GCN-NEXT: v_readlane_b32 s8, v31, 24		; GCN-NEXT: v_readlane_b32 s5, v1, 21
; GCN-NEXT: v_readlane_b32 s9, v31, 25		; GCN-NEXT: v_readlane_b32 s6, v1, 22
; GCN-NEXT: v_readlane_b32 s10, v31, 26		; GCN-NEXT: v_readlane_b32 s7, v1, 23
; GCN-NEXT: v_readlane_b32 s11, v31, 27		; GCN-NEXT: v_readlane_b32 s8, v1, 24
; GCN-NEXT: v_readlane_b32 s12, v31, 28		; GCN-NEXT: v_readlane_b32 s9, v1, 25
; GCN-NEXT: v_readlane_b32 s13, v31, 29		; GCN-NEXT: v_readlane_b32 s10, v1, 26
; GCN-NEXT: v_readlane_b32 s14, v31, 30		; GCN-NEXT: v_readlane_b32 s11, v1, 27
; GCN-NEXT: v_readlane_b32 s15, v31, 31		; GCN-NEXT: v_readlane_b32 s12, v1, 28
; GCN-NEXT: v_readlane_b32 s16, v31, 0		; GCN-NEXT: v_readlane_b32 s13, v1, 29
; GCN-NEXT: v_readlane_b32 s17, v31, 1		; GCN-NEXT: v_readlane_b32 s14, v1, 30
; GCN-NEXT: v_readlane_b32 s18, v31, 2		; GCN-NEXT: v_readlane_b32 s15, v1, 31
; GCN-NEXT: v_readlane_b32 s19, v31, 3		; GCN-NEXT: v_readlane_b32 s16, v1, 0
; GCN-NEXT: v_readlane_b32 s20, v31, 4		; GCN-NEXT: v_readlane_b32 s17, v1, 1
; GCN-NEXT: v_readlane_b32 s21, v31, 5		; GCN-NEXT: v_readlane_b32 s18, v1, 2
; GCN-NEXT: v_readlane_b32 s22, v31, 6		; GCN-NEXT: v_readlane_b32 s19, v1, 3
; GCN-NEXT: v_readlane_b32 s23, v31, 7		; GCN-NEXT: v_readlane_b32 s20, v1, 4
; GCN-NEXT: v_readlane_b32 s24, v31, 8		; GCN-NEXT: v_readlane_b32 s21, v1, 5
; GCN-NEXT: v_readlane_b32 s25, v31, 9		; GCN-NEXT: v_readlane_b32 s22, v1, 6
; GCN-NEXT: v_readlane_b32 s26, v31, 10		; GCN-NEXT: v_readlane_b32 s23, v1, 7
; GCN-NEXT: v_readlane_b32 s27, v31, 11		; GCN-NEXT: v_readlane_b32 s24, v1, 8
; GCN-NEXT: v_readlane_b32 s28, v31, 12		; GCN-NEXT: v_readlane_b32 s25, v1, 9
; GCN-NEXT: v_readlane_b32 s29, v31, 13		; GCN-NEXT: v_readlane_b32 s26, v1, 10
; GCN-NEXT: v_readlane_b32 s30, v31, 14		; GCN-NEXT: v_readlane_b32 s27, v1, 11
; GCN-NEXT: v_readlane_b32 s31, v31, 15		; GCN-NEXT: v_readlane_b32 s28, v1, 12
		; GCN-NEXT: v_readlane_b32 s29, v1, 13
		; GCN-NEXT: v_readlane_b32 s30, v1, 14
		; GCN-NEXT: v_readlane_b32 s31, v1, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[16:31]		; GCN-NEXT: ; use s[16:31]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]		; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s4, v31, 48		; GCN-NEXT: v_readlane_b32 s4, v1, 48
; GCN-NEXT: v_readlane_b32 s5, v31, 49		; GCN-NEXT: v_readlane_b32 s5, v1, 49
; GCN-NEXT: v_readlane_b32 s6, v31, 50		; GCN-NEXT: v_readlane_b32 s6, v1, 50
; GCN-NEXT: v_readlane_b32 s7, v31, 51		; GCN-NEXT: v_readlane_b32 s7, v1, 51
; GCN-NEXT: v_readlane_b32 s8, v31, 52		; GCN-NEXT: v_readlane_b32 s8, v1, 52
; GCN-NEXT: v_readlane_b32 s9, v31, 53		; GCN-NEXT: v_readlane_b32 s9, v1, 53
; GCN-NEXT: v_readlane_b32 s10, v31, 54		; GCN-NEXT: v_readlane_b32 s10, v1, 54
; GCN-NEXT: v_readlane_b32 s11, v31, 55		; GCN-NEXT: v_readlane_b32 s11, v1, 55
; GCN-NEXT: v_readlane_b32 s12, v31, 56		; GCN-NEXT: v_readlane_b32 s12, v1, 56
; GCN-NEXT: v_readlane_b32 s13, v31, 57		; GCN-NEXT: v_readlane_b32 s13, v1, 57
; GCN-NEXT: v_readlane_b32 s14, v31, 58		; GCN-NEXT: v_readlane_b32 s14, v1, 58
; GCN-NEXT: v_readlane_b32 s15, v31, 59		; GCN-NEXT: v_readlane_b32 s15, v1, 59
; GCN-NEXT: v_readlane_b32 s16, v31, 60		; GCN-NEXT: v_readlane_b32 s16, v1, 60
; GCN-NEXT: v_readlane_b32 s17, v31, 61		; GCN-NEXT: v_readlane_b32 s17, v1, 61
; GCN-NEXT: v_readlane_b32 s18, v31, 62		; GCN-NEXT: v_readlane_b32 s18, v1, 62
; GCN-NEXT: v_readlane_b32 s19, v31, 63		; GCN-NEXT: v_readlane_b32 s19, v1, 63
; GCN-NEXT: s_mov_b64 s[2:3], exec
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s0, v0, 0		; GCN-NEXT: v_readlane_b32 s0, v0, 0
; GCN-NEXT: v_readlane_b32 s1, v0, 1		; GCN-NEXT: v_readlane_b32 s1, v0, 1
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[2:3]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[36:51]		; GCN-NEXT: ; use s[36:51]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[4:19]		; GCN-NEXT: ; use s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:1]		; GCN-NEXT: ; use s[0:1]
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v31, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v31, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v31, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v31, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v31, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v31, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v31, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v31, s12, 8		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: v_writelane_b32 v31, s13, 9		; GCN-NEXT: v_writelane_b32 v0, s12, 8
; GCN-NEXT: v_writelane_b32 v31, s14, 10		; GCN-NEXT: v_writelane_b32 v0, s13, 9
; GCN-NEXT: v_writelane_b32 v31, s15, 11		; GCN-NEXT: v_writelane_b32 v0, s14, 10
; GCN-NEXT: v_writelane_b32 v31, s16, 12		; GCN-NEXT: v_writelane_b32 v0, s15, 11
; GCN-NEXT: v_writelane_b32 v31, s17, 13		; GCN-NEXT: v_writelane_b32 v0, s16, 12
; GCN-NEXT: v_writelane_b32 v31, s18, 14		; GCN-NEXT: v_writelane_b32 v0, s17, 13
; GCN-NEXT: v_writelane_b32 v31, s19, 15		; GCN-NEXT: v_writelane_b32 v0, s18, 14
		; GCN-NEXT: v_writelane_b32 v0, s19, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 16		; GCN-NEXT: v_writelane_b32 v0, s4, 16
; GCN-NEXT: v_writelane_b32 v31, s5, 17		; GCN-NEXT: v_writelane_b32 v0, s5, 17
; GCN-NEXT: v_writelane_b32 v31, s6, 18		; GCN-NEXT: v_writelane_b32 v0, s6, 18
; GCN-NEXT: v_writelane_b32 v31, s7, 19		; GCN-NEXT: v_writelane_b32 v0, s7, 19
; GCN-NEXT: v_writelane_b32 v31, s8, 20		; GCN-NEXT: v_writelane_b32 v0, s8, 20
; GCN-NEXT: v_writelane_b32 v31, s9, 21		; GCN-NEXT: v_writelane_b32 v0, s9, 21
; GCN-NEXT: v_writelane_b32 v31, s10, 22		; GCN-NEXT: v_writelane_b32 v0, s10, 22
; GCN-NEXT: v_writelane_b32 v31, s11, 23		; GCN-NEXT: v_writelane_b32 v0, s11, 23
; GCN-NEXT: v_writelane_b32 v31, s12, 24		; GCN-NEXT: v_writelane_b32 v0, s12, 24
; GCN-NEXT: v_writelane_b32 v31, s13, 25		; GCN-NEXT: v_writelane_b32 v0, s13, 25
; GCN-NEXT: v_writelane_b32 v31, s14, 26		; GCN-NEXT: v_writelane_b32 v0, s14, 26
; GCN-NEXT: v_writelane_b32 v31, s15, 27		; GCN-NEXT: v_writelane_b32 v0, s15, 27
; GCN-NEXT: v_writelane_b32 v31, s16, 28		; GCN-NEXT: v_writelane_b32 v0, s16, 28
; GCN-NEXT: v_writelane_b32 v31, s17, 29		; GCN-NEXT: v_writelane_b32 v0, s17, 29
; GCN-NEXT: v_writelane_b32 v31, s18, 30		; GCN-NEXT: v_writelane_b32 v0, s18, 30
; GCN-NEXT: v_writelane_b32 v31, s19, 31		; GCN-NEXT: v_writelane_b32 v0, s19, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 32		; GCN-NEXT: v_writelane_b32 v0, s4, 32
; GCN-NEXT: v_writelane_b32 v31, s5, 33		; GCN-NEXT: v_writelane_b32 v0, s5, 33
; GCN-NEXT: v_writelane_b32 v31, s6, 34		; GCN-NEXT: v_writelane_b32 v0, s6, 34
; GCN-NEXT: v_writelane_b32 v31, s7, 35		; GCN-NEXT: v_writelane_b32 v0, s7, 35
; GCN-NEXT: v_writelane_b32 v31, s8, 36		; GCN-NEXT: v_writelane_b32 v0, s8, 36
; GCN-NEXT: v_writelane_b32 v31, s9, 37		; GCN-NEXT: v_writelane_b32 v0, s9, 37
; GCN-NEXT: v_writelane_b32 v31, s10, 38		; GCN-NEXT: v_writelane_b32 v0, s10, 38
; GCN-NEXT: v_writelane_b32 v31, s11, 39		; GCN-NEXT: v_writelane_b32 v0, s11, 39
; GCN-NEXT: v_writelane_b32 v31, s12, 40		; GCN-NEXT: v_writelane_b32 v0, s12, 40
; GCN-NEXT: v_writelane_b32 v31, s13, 41		; GCN-NEXT: v_writelane_b32 v0, s13, 41
; GCN-NEXT: v_writelane_b32 v31, s14, 42		; GCN-NEXT: v_writelane_b32 v0, s14, 42
; GCN-NEXT: v_writelane_b32 v31, s15, 43		; GCN-NEXT: v_writelane_b32 v0, s15, 43
; GCN-NEXT: v_writelane_b32 v31, s16, 44		; GCN-NEXT: v_writelane_b32 v0, s16, 44
; GCN-NEXT: v_writelane_b32 v31, s17, 45		; GCN-NEXT: v_writelane_b32 v0, s17, 45
; GCN-NEXT: v_writelane_b32 v31, s18, 46		; GCN-NEXT: v_writelane_b32 v0, s18, 46
; GCN-NEXT: v_writelane_b32 v31, s19, 47		; GCN-NEXT: v_writelane_b32 v0, s19, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 48		; GCN-NEXT: v_writelane_b32 v0, s4, 48
; GCN-NEXT: v_writelane_b32 v31, s5, 49		; GCN-NEXT: v_writelane_b32 v0, s5, 49
; GCN-NEXT: v_writelane_b32 v31, s6, 50		; GCN-NEXT: v_writelane_b32 v0, s6, 50
; GCN-NEXT: v_writelane_b32 v31, s7, 51		; GCN-NEXT: v_writelane_b32 v0, s7, 51
; GCN-NEXT: v_writelane_b32 v31, s8, 52		; GCN-NEXT: v_writelane_b32 v0, s8, 52
; GCN-NEXT: v_writelane_b32 v31, s9, 53		; GCN-NEXT: v_writelane_b32 v0, s9, 53
; GCN-NEXT: v_writelane_b32 v31, s10, 54		; GCN-NEXT: v_writelane_b32 v0, s10, 54
; GCN-NEXT: v_writelane_b32 v31, s11, 55		; GCN-NEXT: v_writelane_b32 v0, s11, 55
; GCN-NEXT: v_writelane_b32 v31, s12, 56		; GCN-NEXT: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v31, s13, 57		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v31, s14, 58		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v31, s15, 59		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v31, s16, 60		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v31, s17, 61		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v31, s18, 62		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v31, s19, 63		; GCN-NEXT: v_writelane_b32 v0, s19, 63
		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: v_writelane_b32 v0, s2, 0		; GCN-NEXT: v_writelane_b32 v0, s2, 0
; GCN-NEXT: v_writelane_b32 v0, s3, 1		; GCN-NEXT: v_writelane_b32 v0, s3, 1
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB3_2		; GCN-NEXT: s_cbranch_scc1 .LBB3_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s36, v31, 32		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s37, v31, 33		; GCN-NEXT: buffer_load_dword v2, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s38, v31, 34		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s39, v31, 35		; GCN-NEXT: v_readlane_b32 s36, v2, 32
; GCN-NEXT: v_readlane_b32 s40, v31, 36		; GCN-NEXT: v_readlane_b32 s37, v2, 33
; GCN-NEXT: v_readlane_b32 s41, v31, 37		; GCN-NEXT: v_readlane_b32 s38, v2, 34
; GCN-NEXT: v_readlane_b32 s42, v31, 38		; GCN-NEXT: v_readlane_b32 s39, v2, 35
; GCN-NEXT: v_readlane_b32 s43, v31, 39		; GCN-NEXT: v_readlane_b32 s40, v2, 36
; GCN-NEXT: v_readlane_b32 s44, v31, 40		; GCN-NEXT: v_readlane_b32 s41, v2, 37
; GCN-NEXT: v_readlane_b32 s45, v31, 41		; GCN-NEXT: v_readlane_b32 s42, v2, 38
; GCN-NEXT: v_readlane_b32 s46, v31, 42		; GCN-NEXT: v_readlane_b32 s43, v2, 39
; GCN-NEXT: v_readlane_b32 s47, v31, 43		; GCN-NEXT: v_readlane_b32 s44, v2, 40
; GCN-NEXT: v_readlane_b32 s48, v31, 44		; GCN-NEXT: v_readlane_b32 s45, v2, 41
; GCN-NEXT: v_readlane_b32 s49, v31, 45		; GCN-NEXT: v_readlane_b32 s46, v2, 42
; GCN-NEXT: v_readlane_b32 s50, v31, 46		; GCN-NEXT: v_readlane_b32 s47, v2, 43
; GCN-NEXT: v_readlane_b32 s51, v31, 47		; GCN-NEXT: v_readlane_b32 s48, v2, 44
; GCN-NEXT: v_readlane_b32 s0, v31, 16		; GCN-NEXT: v_readlane_b32 s49, v2, 45
; GCN-NEXT: v_readlane_b32 s1, v31, 17		; GCN-NEXT: v_readlane_b32 s50, v2, 46
; GCN-NEXT: v_readlane_b32 s2, v31, 18		; GCN-NEXT: v_readlane_b32 s51, v2, 47
; GCN-NEXT: v_readlane_b32 s3, v31, 19		; GCN-NEXT: v_readlane_b32 s0, v2, 16
; GCN-NEXT: v_readlane_b32 s4, v31, 20		; GCN-NEXT: v_readlane_b32 s1, v2, 17
; GCN-NEXT: v_readlane_b32 s5, v31, 21		; GCN-NEXT: v_readlane_b32 s2, v2, 18
; GCN-NEXT: v_readlane_b32 s6, v31, 22		; GCN-NEXT: v_readlane_b32 s3, v2, 19
; GCN-NEXT: v_readlane_b32 s7, v31, 23		; GCN-NEXT: v_readlane_b32 s4, v2, 20
; GCN-NEXT: v_readlane_b32 s8, v31, 24		; GCN-NEXT: v_readlane_b32 s5, v2, 21
; GCN-NEXT: v_readlane_b32 s9, v31, 25		; GCN-NEXT: v_readlane_b32 s6, v2, 22
; GCN-NEXT: v_readlane_b32 s10, v31, 26		; GCN-NEXT: v_readlane_b32 s7, v2, 23
; GCN-NEXT: v_readlane_b32 s11, v31, 27		; GCN-NEXT: v_readlane_b32 s8, v2, 24
; GCN-NEXT: v_readlane_b32 s12, v31, 28		; GCN-NEXT: v_readlane_b32 s9, v2, 25
; GCN-NEXT: v_readlane_b32 s13, v31, 29		; GCN-NEXT: v_readlane_b32 s10, v2, 26
; GCN-NEXT: v_readlane_b32 s14, v31, 30		; GCN-NEXT: v_readlane_b32 s11, v2, 27
; GCN-NEXT: v_readlane_b32 s15, v31, 31		; GCN-NEXT: v_readlane_b32 s12, v2, 28
; GCN-NEXT: v_readlane_b32 s16, v31, 0		; GCN-NEXT: v_readlane_b32 s13, v2, 29
; GCN-NEXT: v_readlane_b32 s17, v31, 1		; GCN-NEXT: v_readlane_b32 s14, v2, 30
; GCN-NEXT: v_readlane_b32 s18, v31, 2		; GCN-NEXT: v_readlane_b32 s15, v2, 31
; GCN-NEXT: v_readlane_b32 s19, v31, 3		; GCN-NEXT: v_readlane_b32 s16, v2, 0
; GCN-NEXT: v_readlane_b32 s20, v31, 4		; GCN-NEXT: v_readlane_b32 s17, v2, 1
; GCN-NEXT: v_readlane_b32 s21, v31, 5		; GCN-NEXT: v_readlane_b32 s18, v2, 2
; GCN-NEXT: v_readlane_b32 s22, v31, 6		; GCN-NEXT: v_readlane_b32 s19, v2, 3
; GCN-NEXT: v_readlane_b32 s23, v31, 7		; GCN-NEXT: v_readlane_b32 s20, v2, 4
; GCN-NEXT: v_readlane_b32 s24, v31, 8		; GCN-NEXT: v_readlane_b32 s21, v2, 5
; GCN-NEXT: v_readlane_b32 s25, v31, 9		; GCN-NEXT: v_readlane_b32 s22, v2, 6
; GCN-NEXT: v_readlane_b32 s26, v31, 10		; GCN-NEXT: v_readlane_b32 s23, v2, 7
; GCN-NEXT: v_readlane_b32 s27, v31, 11		; GCN-NEXT: v_readlane_b32 s24, v2, 8
; GCN-NEXT: v_readlane_b32 s28, v31, 12		; GCN-NEXT: v_readlane_b32 s25, v2, 9
; GCN-NEXT: v_readlane_b32 s29, v31, 13		; GCN-NEXT: v_readlane_b32 s26, v2, 10
; GCN-NEXT: v_readlane_b32 s30, v31, 14		; GCN-NEXT: v_readlane_b32 s27, v2, 11
; GCN-NEXT: v_readlane_b32 s31, v31, 15		; GCN-NEXT: v_readlane_b32 s28, v2, 12
		; GCN-NEXT: v_readlane_b32 s29, v2, 13
		; GCN-NEXT: v_readlane_b32 s30, v2, 14
		; GCN-NEXT: v_readlane_b32 s31, v2, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def v0		; GCN-NEXT: ; def v0
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[16:31]		; GCN-NEXT: ; use s[16:31]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]		; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s4, v31, 48		; GCN-NEXT: v_readlane_b32 s4, v2, 48
; GCN-NEXT: v_readlane_b32 s5, v31, 49		; GCN-NEXT: v_readlane_b32 s5, v2, 49
; GCN-NEXT: v_readlane_b32 s6, v31, 50		; GCN-NEXT: v_readlane_b32 s6, v2, 50
; GCN-NEXT: v_readlane_b32 s7, v31, 51		; GCN-NEXT: v_readlane_b32 s7, v2, 51
; GCN-NEXT: v_readlane_b32 s8, v31, 52		; GCN-NEXT: v_readlane_b32 s8, v2, 52
; GCN-NEXT: v_readlane_b32 s9, v31, 53		; GCN-NEXT: v_readlane_b32 s9, v2, 53
; GCN-NEXT: v_readlane_b32 s10, v31, 54		; GCN-NEXT: v_readlane_b32 s10, v2, 54
; GCN-NEXT: v_readlane_b32 s11, v31, 55		; GCN-NEXT: v_readlane_b32 s11, v2, 55
; GCN-NEXT: v_readlane_b32 s12, v31, 56		; GCN-NEXT: v_readlane_b32 s12, v2, 56
; GCN-NEXT: v_readlane_b32 s13, v31, 57		; GCN-NEXT: v_readlane_b32 s13, v2, 57
; GCN-NEXT: v_readlane_b32 s14, v31, 58		; GCN-NEXT: v_readlane_b32 s14, v2, 58
; GCN-NEXT: v_readlane_b32 s15, v31, 59		; GCN-NEXT: v_readlane_b32 s15, v2, 59
; GCN-NEXT: v_readlane_b32 s16, v31, 60		; GCN-NEXT: v_readlane_b32 s16, v2, 60
; GCN-NEXT: v_readlane_b32 s17, v31, 61		; GCN-NEXT: v_readlane_b32 s17, v2, 61
; GCN-NEXT: v_readlane_b32 s18, v31, 62		; GCN-NEXT: v_readlane_b32 s18, v2, 62
; GCN-NEXT: v_readlane_b32 s19, v31, 63		; GCN-NEXT: v_readlane_b32 s19, v2, 63
; GCN-NEXT: s_mov_b64 s[2:3], exec
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v1, off, s[52:55], 0
; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s0, v1, 0		; GCN-NEXT: v_readlane_b32 s0, v1, 0
; GCN-NEXT: v_readlane_b32 s1, v1, 1		; GCN-NEXT: v_readlane_b32 s1, v1, 1
; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[2:3]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[36:51]		; GCN-NEXT: ; use s[36:51]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[4:19]		; GCN-NEXT: ; use s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:1]		; GCN-NEXT: ; use s[0:1]
Show All 37 Lines

llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll

	; RUN: not --crash llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs -o /dev/null %s 2>&1 \| FileCheck %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s

				; This was a negative test to catch an extreme case when all options are exhausted
				; while trying to spill SGPRs to memory. After we enabled SGPR spills into virtual VGPRs
				; the edge case won't arise and the test would always compile.

	; This ends up needing to spill SGPRs to memory, and also does not
	; have any free SGPRs available to save the exec mask when doing so.
	; The register scavenger also needs to use the emergency stack slot,
	; which tries to place the scavenged register restore instruction as
	; far the block as possible, near the terminator. This places a
	; restore instruction between the condition and the conditional
	; branch, which gets expanded into a sequence involving s_not_b64 on
	; the exec mask, clobbering SCC value before the branch. We probably
	; have to stop relying on being able to flip and restore the exec
	; mask, and always require a free SGPR for saving exec.

	; CHECK: * Bad machine code: Using an undefined physical register *
	; CHECK-NEXT: - function: kernel0
	; CHECK-NEXT: - basic block: %bb.0
	; CHECK-NEXT: - instruction: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
	; CHECK-NEXT: - operand 1: implicit killed $scc
	define amdgpu_kernel void @kernel0(i32 addrspace(1)* %out, i32 %in) #1 {			define amdgpu_kernel void @kernel0(i32 addrspace(1)* %out, i32 %in) #1 {
				; CHECK-LABEL: kernel0:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[2:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: s_load_dword s0, s[4:5], 0x8
				; CHECK-NEXT: v_writelane_b32 v0, s2, 0
				; CHECK-NEXT: v_writelane_b32 v0, s3, 1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 2
				; CHECK-NEXT: v_writelane_b32 v0, s5, 3
				; CHECK-NEXT: v_writelane_b32 v0, s6, 4
				; CHECK-NEXT: v_writelane_b32 v0, s7, 5
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:11]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 6
				; CHECK-NEXT: v_writelane_b32 v0, s5, 7
				; CHECK-NEXT: v_writelane_b32 v0, s6, 8
				; CHECK-NEXT: v_writelane_b32 v0, s7, 9
				; CHECK-NEXT: v_writelane_b32 v0, s8, 10
				; CHECK-NEXT: v_writelane_b32 v0, s9, 11
				; CHECK-NEXT: v_writelane_b32 v0, s10, 12
				; CHECK-NEXT: v_writelane_b32 v0, s11, 13
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:19]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 14
				; CHECK-NEXT: v_writelane_b32 v0, s5, 15
				; CHECK-NEXT: v_writelane_b32 v0, s6, 16
				; CHECK-NEXT: v_writelane_b32 v0, s7, 17
				; CHECK-NEXT: v_writelane_b32 v0, s8, 18
				; CHECK-NEXT: v_writelane_b32 v0, s9, 19
				; CHECK-NEXT: v_writelane_b32 v0, s10, 20
				; CHECK-NEXT: v_writelane_b32 v0, s11, 21
				; CHECK-NEXT: v_writelane_b32 v0, s12, 22
				; CHECK-NEXT: v_writelane_b32 v0, s13, 23
				; CHECK-NEXT: v_writelane_b32 v0, s14, 24
				; CHECK-NEXT: v_writelane_b32 v0, s15, 25
				; CHECK-NEXT: v_writelane_b32 v0, s16, 26
				; CHECK-NEXT: v_writelane_b32 v0, s17, 27
				; CHECK-NEXT: v_writelane_b32 v0, s18, 28
				; CHECK-NEXT: v_writelane_b32 v0, s19, 29
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[2:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s2, 30
				; CHECK-NEXT: v_writelane_b32 v0, s3, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 32
				; CHECK-NEXT: v_writelane_b32 v0, s5, 33
				; CHECK-NEXT: v_writelane_b32 v0, s6, 34
				; CHECK-NEXT: v_writelane_b32 v0, s7, 35
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:11]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 36
				; CHECK-NEXT: v_writelane_b32 v0, s5, 37
				; CHECK-NEXT: v_writelane_b32 v0, s6, 38
				; CHECK-NEXT: v_writelane_b32 v0, s7, 39
				; CHECK-NEXT: v_writelane_b32 v0, s8, 40
				; CHECK-NEXT: v_writelane_b32 v0, s9, 41
				; CHECK-NEXT: v_writelane_b32 v0, s10, 42
				; CHECK-NEXT: v_writelane_b32 v0, s11, 43
				; CHECK-NEXT: s_waitcnt lgkmcnt(0)
				; CHECK-NEXT: s_cmp_lg_u32 s0, 0
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[16:31]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[52:53]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[48:51]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[36:43]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s0, 44
				; CHECK-NEXT: v_writelane_b32 v0, s1, 45
				; CHECK-NEXT: v_writelane_b32 v0, s2, 46
				; CHECK-NEXT: v_writelane_b32 v0, s3, 47
				; CHECK-NEXT: v_writelane_b32 v0, s4, 48
				; CHECK-NEXT: v_writelane_b32 v0, s5, 49
				; CHECK-NEXT: v_writelane_b32 v0, s6, 50
				; CHECK-NEXT: v_writelane_b32 v0, s7, 51
				; CHECK-NEXT: v_writelane_b32 v0, s8, 52
				; CHECK-NEXT: v_writelane_b32 v0, s9, 53
				; CHECK-NEXT: v_writelane_b32 v0, s10, 54
				; CHECK-NEXT: v_writelane_b32 v0, s11, 55
				; CHECK-NEXT: v_writelane_b32 v0, s12, 56
				; CHECK-NEXT: v_writelane_b32 v0, s13, 57
				; CHECK-NEXT: v_writelane_b32 v0, s14, 58
				; CHECK-NEXT: v_writelane_b32 v0, s15, 59
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[34:35]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[44:47]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ; implicit-def: $vgpr1
				; CHECK-NEXT: v_writelane_b32 v0, s0, 60
				; CHECK-NEXT: v_writelane_b32 v1, s4, 0
				; CHECK-NEXT: v_writelane_b32 v0, s1, 61
				; CHECK-NEXT: v_writelane_b32 v1, s5, 1
				; CHECK-NEXT: v_writelane_b32 v0, s2, 62
				; CHECK-NEXT: v_writelane_b32 v1, s6, 2
				; CHECK-NEXT: v_writelane_b32 v0, s3, 63
				; CHECK-NEXT: v_writelane_b32 v1, s7, 3
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 4
				; CHECK-NEXT: v_writelane_b32 v1, s1, 5
				; CHECK-NEXT: v_writelane_b32 v1, s2, 6
				; CHECK-NEXT: v_writelane_b32 v1, s3, 7
				; CHECK-NEXT: v_writelane_b32 v1, s4, 8
				; CHECK-NEXT: v_writelane_b32 v1, s5, 9
				; CHECK-NEXT: v_writelane_b32 v1, s6, 10
				; CHECK-NEXT: v_writelane_b32 v1, s7, 11
				; CHECK-NEXT: v_writelane_b32 v1, s8, 12
				; CHECK-NEXT: v_writelane_b32 v1, s9, 13
				; CHECK-NEXT: v_writelane_b32 v1, s10, 14
				; CHECK-NEXT: v_writelane_b32 v1, s11, 15
				; CHECK-NEXT: v_writelane_b32 v1, s12, 16
				; CHECK-NEXT: v_writelane_b32 v1, s13, 17
				; CHECK-NEXT: v_writelane_b32 v1, s14, 18
				; CHECK-NEXT: v_writelane_b32 v1, s15, 19
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[54:55]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 20
				; CHECK-NEXT: v_writelane_b32 v1, s1, 21
				; CHECK-NEXT: v_writelane_b32 v1, s2, 22
				; CHECK-NEXT: v_writelane_b32 v1, s3, 23
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 24
				; CHECK-NEXT: v_writelane_b32 v1, s1, 25
				; CHECK-NEXT: v_writelane_b32 v1, s2, 26
				; CHECK-NEXT: v_writelane_b32 v1, s3, 27
				; CHECK-NEXT: v_writelane_b32 v1, s4, 28
				; CHECK-NEXT: v_writelane_b32 v1, s5, 29
				; CHECK-NEXT: v_writelane_b32 v1, s6, 30
				; CHECK-NEXT: v_writelane_b32 v1, s7, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 32
				; CHECK-NEXT: v_writelane_b32 v1, s1, 33
				; CHECK-NEXT: v_writelane_b32 v1, s2, 34
				; CHECK-NEXT: v_writelane_b32 v1, s3, 35
				; CHECK-NEXT: v_writelane_b32 v1, s4, 36
				; CHECK-NEXT: v_writelane_b32 v1, s5, 37
				; CHECK-NEXT: v_writelane_b32 v1, s6, 38
				; CHECK-NEXT: v_writelane_b32 v1, s7, 39
				; CHECK-NEXT: v_writelane_b32 v1, s8, 40
				; CHECK-NEXT: v_writelane_b32 v1, s9, 41
				; CHECK-NEXT: v_writelane_b32 v1, s10, 42
				; CHECK-NEXT: v_writelane_b32 v1, s11, 43
				; CHECK-NEXT: v_writelane_b32 v1, s12, 44
				; CHECK-NEXT: v_writelane_b32 v1, s13, 45
				; CHECK-NEXT: v_writelane_b32 v1, s14, 46
				; CHECK-NEXT: v_writelane_b32 v1, s15, 47
				; CHECK-NEXT: s_cbranch_scc0 .LBB0_2
				; CHECK-NEXT: ; %bb.1: ; %ret
				; CHECK-NEXT: s_endpgm
				; CHECK-NEXT: .LBB0_2: ; %bb0
				; CHECK-NEXT: v_readlane_b32 s0, v0, 0
				; CHECK-NEXT: v_readlane_b32 s1, v0, 1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:1]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 2
				; CHECK-NEXT: v_readlane_b32 s1, v0, 3
				; CHECK-NEXT: v_readlane_b32 s2, v0, 4
				; CHECK-NEXT: v_readlane_b32 s3, v0, 5
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 6
				; CHECK-NEXT: v_readlane_b32 s1, v0, 7
				; CHECK-NEXT: v_readlane_b32 s2, v0, 8
				; CHECK-NEXT: v_readlane_b32 s3, v0, 9
				; CHECK-NEXT: v_readlane_b32 s4, v0, 10
				; CHECK-NEXT: v_readlane_b32 s5, v0, 11
				; CHECK-NEXT: v_readlane_b32 s6, v0, 12
				; CHECK-NEXT: v_readlane_b32 s7, v0, 13
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 14
				; CHECK-NEXT: v_readlane_b32 s1, v0, 15
				; CHECK-NEXT: v_readlane_b32 s2, v0, 16
				; CHECK-NEXT: v_readlane_b32 s3, v0, 17
				; CHECK-NEXT: v_readlane_b32 s4, v0, 18
				; CHECK-NEXT: v_readlane_b32 s5, v0, 19
				; CHECK-NEXT: v_readlane_b32 s6, v0, 20
				; CHECK-NEXT: v_readlane_b32 s7, v0, 21
				; CHECK-NEXT: v_readlane_b32 s8, v0, 22
				; CHECK-NEXT: v_readlane_b32 s9, v0, 23
				; CHECK-NEXT: v_readlane_b32 s10, v0, 24
				; CHECK-NEXT: v_readlane_b32 s11, v0, 25
				; CHECK-NEXT: v_readlane_b32 s12, v0, 26
				; CHECK-NEXT: v_readlane_b32 s13, v0, 27
				; CHECK-NEXT: v_readlane_b32 s14, v0, 28
				; CHECK-NEXT: v_readlane_b32 s15, v0, 29
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 30
				; CHECK-NEXT: v_readlane_b32 s1, v0, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:1]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 32
				; CHECK-NEXT: v_readlane_b32 s1, v0, 33
				; CHECK-NEXT: v_readlane_b32 s2, v0, 34
				; CHECK-NEXT: v_readlane_b32 s3, v0, 35
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 36
				; CHECK-NEXT: v_readlane_b32 s1, v0, 37
				; CHECK-NEXT: v_readlane_b32 s2, v0, 38
				; CHECK-NEXT: v_readlane_b32 s3, v0, 39
				; CHECK-NEXT: v_readlane_b32 s4, v0, 40
				; CHECK-NEXT: v_readlane_b32 s5, v0, 41
				; CHECK-NEXT: v_readlane_b32 s6, v0, 42
				; CHECK-NEXT: v_readlane_b32 s7, v0, 43
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 44
				; CHECK-NEXT: v_readlane_b32 s1, v0, 45
				; CHECK-NEXT: v_readlane_b32 s2, v0, 46
				; CHECK-NEXT: v_readlane_b32 s3, v0, 47
				; CHECK-NEXT: v_readlane_b32 s4, v0, 48
				; CHECK-NEXT: v_readlane_b32 s5, v0, 49
				; CHECK-NEXT: v_readlane_b32 s6, v0, 50
				; CHECK-NEXT: v_readlane_b32 s7, v0, 51
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[16:31]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[52:53]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[48:51]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[36:43]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s8, v0, 52
				; CHECK-NEXT: v_readlane_b32 s9, v0, 53
				; CHECK-NEXT: v_readlane_b32 s10, v0, 54
				; CHECK-NEXT: v_readlane_b32 s11, v0, 55
				; CHECK-NEXT: v_readlane_b32 s12, v0, 56
				; CHECK-NEXT: v_readlane_b32 s13, v0, 57
				; CHECK-NEXT: v_readlane_b32 s14, v0, 58
				; CHECK-NEXT: v_readlane_b32 s15, v0, 59
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 60
				; CHECK-NEXT: v_readlane_b32 s1, v0, 61
				; CHECK-NEXT: v_readlane_b32 s2, v0, 62
				; CHECK-NEXT: v_readlane_b32 s3, v0, 63
				; CHECK-NEXT: v_readlane_b32 s4, v1, 0
				; CHECK-NEXT: v_readlane_b32 s5, v1, 1
				; CHECK-NEXT: v_readlane_b32 s6, v1, 2
				; CHECK-NEXT: v_readlane_b32 s7, v1, 3
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[34:35]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[44:47]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 4
				; CHECK-NEXT: v_readlane_b32 s1, v1, 5
				; CHECK-NEXT: v_readlane_b32 s2, v1, 6
				; CHECK-NEXT: v_readlane_b32 s3, v1, 7
				; CHECK-NEXT: v_readlane_b32 s4, v1, 8
				; CHECK-NEXT: v_readlane_b32 s5, v1, 9
				; CHECK-NEXT: v_readlane_b32 s6, v1, 10
				; CHECK-NEXT: v_readlane_b32 s7, v1, 11
				; CHECK-NEXT: v_readlane_b32 s8, v1, 12
				; CHECK-NEXT: v_readlane_b32 s9, v1, 13
				; CHECK-NEXT: v_readlane_b32 s10, v1, 14
				; CHECK-NEXT: v_readlane_b32 s11, v1, 15
				; CHECK-NEXT: v_readlane_b32 s12, v1, 16
				; CHECK-NEXT: v_readlane_b32 s13, v1, 17
				; CHECK-NEXT: v_readlane_b32 s14, v1, 18
				; CHECK-NEXT: v_readlane_b32 s15, v1, 19
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 20
				; CHECK-NEXT: v_readlane_b32 s1, v1, 21
				; CHECK-NEXT: v_readlane_b32 s2, v1, 22
				; CHECK-NEXT: v_readlane_b32 s3, v1, 23
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[54:55]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 24
				; CHECK-NEXT: v_readlane_b32 s1, v1, 25
				; CHECK-NEXT: v_readlane_b32 s2, v1, 26
				; CHECK-NEXT: v_readlane_b32 s3, v1, 27
				; CHECK-NEXT: v_readlane_b32 s4, v1, 28
				; CHECK-NEXT: v_readlane_b32 s5, v1, 29
				; CHECK-NEXT: v_readlane_b32 s6, v1, 30
				; CHECK-NEXT: v_readlane_b32 s7, v1, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 32
				; CHECK-NEXT: v_readlane_b32 s1, v1, 33
				; CHECK-NEXT: v_readlane_b32 s2, v1, 34
				; CHECK-NEXT: v_readlane_b32 s3, v1, 35
				; CHECK-NEXT: v_readlane_b32 s4, v1, 36
				; CHECK-NEXT: v_readlane_b32 s5, v1, 37
				; CHECK-NEXT: v_readlane_b32 s6, v1, 38
				; CHECK-NEXT: v_readlane_b32 s7, v1, 39
				; CHECK-NEXT: v_readlane_b32 s8, v1, 40
				; CHECK-NEXT: v_readlane_b32 s9, v1, 41
				; CHECK-NEXT: v_readlane_b32 s10, v1, 42
				; CHECK-NEXT: v_readlane_b32 s11, v1, 43
				; CHECK-NEXT: v_readlane_b32 s12, v1, 44
				; CHECK-NEXT: v_readlane_b32 s13, v1, 45
				; CHECK-NEXT: v_readlane_b32 s14, v1, 46
				; CHECK-NEXT: v_readlane_b32 s15, v1, 47
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: s_endpgm
	call void asm sideeffect "", "~{v[0:7]}" () #0			call void asm sideeffect "", "~{v[0:7]}" () #0
	call void asm sideeffect "", "~{v[8:15]}" () #0			call void asm sideeffect "", "~{v[8:15]}" () #0
	call void asm sideeffect "", "~{v[16:19]}"() #0			call void asm sideeffect "", "~{v[16:19]}"() #0
	call void asm sideeffect "", "~{v[20:21]}"() #0			call void asm sideeffect "", "~{v[20:21]}"() #0
	call void asm sideeffect "", "~{v22}"() #0			call void asm sideeffect "", "~{v22}"() #0

	%val0 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0			%val0 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0
	%val1 = call <4 x i32> asm sideeffect "; def $0", "=s" () #0			%val1 = call <4 x i32> asm sideeffect "; def $0", "=s" () #0
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-dead-frame-in-dbg-value.mir

# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck -check-prefix=SGPR_SPILL %s
		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs --start-before=si-lower-sgpr-spills --stop-after=prologepilog -o - %s \| FileCheck -check-prefix=PEI %s

# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, replace the dead frame index in the DBG_VALUE instruction with reg 0.		# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, replace the dead frame index in the DBG_VALUE instruction with reg 0.
# Otherwise, the test would crash during PEI while trying to replace the dead frame index.		# Otherwise, the test would crash during PEI while trying to replace the dead frame index.
--- \|		--- \|
define amdgpu_kernel void @test() { ret void }		define amdgpu_kernel void @test() { ret void }

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !4, producer: "llvm", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !4)		!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !4, producer: "llvm", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !4)
!1 = !DILocalVariable(name: "a", scope: !2, file: !4, line: 126, type: !6)		!1 = !DILocalVariable(name: "a", scope: !2, file: !4, line: 126, type: !6)
Show All 24 Lines	machineFunctionInfo:
hasSpilledSGPRs: true		hasSpilledSGPRs: true
argumentInfo:		argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr4_sgpr5' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		workGroupIDX: { reg: '$sgpr8' }
privateSegmentWaveByteOffset: { reg: '$sgpr9' }		privateSegmentWaveByteOffset: { reg: '$sgpr9' }
body: \|		body: \|
; CHECK-LABEL: name: test		; SGPR_SPILL-LABEL: name: test
; CHECK: bb.0:		; SGPR_SPILL: bb.0:
; CHECK: $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 0, $vgpr0		; SGPR_SPILL: [[VGPR:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK: DBG_VALUE $noreg, 0		; SGPR_SPILL: [[VGPR]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[VGPR]]
; CHECK: bb.1:		; SGPR_SPILL: DBG_VALUE $noreg, 0
; CHECK: $sgpr10 = V_READLANE_B32 $vgpr0, 0		; SGPR_SPILL: bb.1:
; CHECK: S_ENDPGM 0		; SGPR_SPILL: $sgpr10 = V_READLANE_B32 [[VGPR]], 0
		; SGPR_SPILL: S_ENDPGM 0
		; PEI-LABEL: name: test
		; PEI: bb.0:
		; PEI: renamable $[[VGPR:vgpr[0-9]+]] = IMPLICIT_DEF
		; PEI: renamable $[[VGPR]] = V_WRITELANE_B32 killed $sgpr10, 0, killed $[[VGPR]]
		; PEI: bb.1:
		; PEI: $sgpr10 = V_READLANE_B32 killed $[[VGPR]], 0
		; PEI: S_ENDPGM 0
bb.0:		bb.0:
renamable $sgpr10 = IMPLICIT_DEF		renamable $sgpr10 = IMPLICIT_DEF
SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
DBG_VALUE %stack.0, 0, !1, !8, debug-location !9		DBG_VALUE %stack.0, 0, !1, !8, debug-location !9

bb.1:		bb.1:
renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
S_ENDPGM 0		S_ENDPGM 0

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; The first 64 SGPR spills can go to a VGPR, but there isn't a second			; This test was originally written when SGPRs are spilled directly to physical VGPRs and
	; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.			; stressed a case when there wasn't enough VGPRs to accommodate all spills.
				; When we started spilling them into virtual VGPR lanes, we always succeed in doing so.
				; The regalloc pass later takes care of allocating VGPRs to these virtual registers.

	define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {			define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
	; GCN-LABEL: partial_no_vgprs_last_sgpr_spill:			; GCN-LABEL: partial_no_vgprs_last_sgpr_spill:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_add_u32 s0, s0, s7			; GCN-NEXT: s_add_u32 s0, s0, s7
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_load_dword s4, s[4:5], 0x2			; GCN-NEXT: s_load_dword s4, s[4:5], 0x2
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 0			; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: v_writelane_b32 v23, s9, 1			; GCN-NEXT: v_writelane_b32 v0, s8, 0
	; GCN-NEXT: v_writelane_b32 v23, s10, 2			; GCN-NEXT: v_writelane_b32 v0, s9, 1
	; GCN-NEXT: v_writelane_b32 v23, s11, 3			; GCN-NEXT: v_writelane_b32 v0, s10, 2
	; GCN-NEXT: v_writelane_b32 v23, s12, 4			; GCN-NEXT: v_writelane_b32 v0, s11, 3
	; GCN-NEXT: v_writelane_b32 v23, s13, 5			; GCN-NEXT: v_writelane_b32 v0, s12, 4
	; GCN-NEXT: v_writelane_b32 v23, s14, 6			; GCN-NEXT: v_writelane_b32 v0, s13, 5
	; GCN-NEXT: v_writelane_b32 v23, s15, 7			; GCN-NEXT: v_writelane_b32 v0, s14, 6
	; GCN-NEXT: v_writelane_b32 v23, s16, 8			; GCN-NEXT: v_writelane_b32 v0, s15, 7
	; GCN-NEXT: v_writelane_b32 v23, s17, 9			; GCN-NEXT: v_writelane_b32 v0, s16, 8
	; GCN-NEXT: v_writelane_b32 v23, s18, 10			; GCN-NEXT: v_writelane_b32 v0, s17, 9
	; GCN-NEXT: v_writelane_b32 v23, s19, 11			; GCN-NEXT: v_writelane_b32 v0, s18, 10
	; GCN-NEXT: v_writelane_b32 v23, s20, 12			; GCN-NEXT: v_writelane_b32 v0, s19, 11
	; GCN-NEXT: v_writelane_b32 v23, s21, 13			; GCN-NEXT: v_writelane_b32 v0, s20, 12
	; GCN-NEXT: v_writelane_b32 v23, s22, 14			; GCN-NEXT: v_writelane_b32 v0, s21, 13
	; GCN-NEXT: v_writelane_b32 v23, s23, 15			; GCN-NEXT: v_writelane_b32 v0, s22, 14
				; GCN-NEXT: v_writelane_b32 v0, s23, 15
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 16			; GCN-NEXT: v_writelane_b32 v0, s8, 16
	; GCN-NEXT: v_writelane_b32 v23, s9, 17			; GCN-NEXT: v_writelane_b32 v0, s9, 17
	; GCN-NEXT: v_writelane_b32 v23, s10, 18			; GCN-NEXT: v_writelane_b32 v0, s10, 18
	; GCN-NEXT: v_writelane_b32 v23, s11, 19			; GCN-NEXT: v_writelane_b32 v0, s11, 19
	; GCN-NEXT: v_writelane_b32 v23, s12, 20			; GCN-NEXT: v_writelane_b32 v0, s12, 20
	; GCN-NEXT: v_writelane_b32 v23, s13, 21			; GCN-NEXT: v_writelane_b32 v0, s13, 21
	; GCN-NEXT: v_writelane_b32 v23, s14, 22			; GCN-NEXT: v_writelane_b32 v0, s14, 22
	; GCN-NEXT: v_writelane_b32 v23, s15, 23			; GCN-NEXT: v_writelane_b32 v0, s15, 23
	; GCN-NEXT: v_writelane_b32 v23, s16, 24			; GCN-NEXT: v_writelane_b32 v0, s16, 24
	; GCN-NEXT: v_writelane_b32 v23, s17, 25			; GCN-NEXT: v_writelane_b32 v0, s17, 25
	; GCN-NEXT: v_writelane_b32 v23, s18, 26			; GCN-NEXT: v_writelane_b32 v0, s18, 26
	; GCN-NEXT: v_writelane_b32 v23, s19, 27			; GCN-NEXT: v_writelane_b32 v0, s19, 27
	; GCN-NEXT: v_writelane_b32 v23, s20, 28			; GCN-NEXT: v_writelane_b32 v0, s20, 28
	; GCN-NEXT: v_writelane_b32 v23, s21, 29			; GCN-NEXT: v_writelane_b32 v0, s21, 29
	; GCN-NEXT: v_writelane_b32 v23, s22, 30			; GCN-NEXT: v_writelane_b32 v0, s22, 30
	; GCN-NEXT: v_writelane_b32 v23, s23, 31			; GCN-NEXT: v_writelane_b32 v0, s23, 31
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 32			; GCN-NEXT: v_writelane_b32 v0, s8, 32
	; GCN-NEXT: v_writelane_b32 v23, s9, 33			; GCN-NEXT: v_writelane_b32 v0, s9, 33
	; GCN-NEXT: v_writelane_b32 v23, s10, 34			; GCN-NEXT: v_writelane_b32 v0, s10, 34
	; GCN-NEXT: v_writelane_b32 v23, s11, 35			; GCN-NEXT: v_writelane_b32 v0, s11, 35
	; GCN-NEXT: v_writelane_b32 v23, s12, 36			; GCN-NEXT: v_writelane_b32 v0, s12, 36
	; GCN-NEXT: v_writelane_b32 v23, s13, 37			; GCN-NEXT: v_writelane_b32 v0, s13, 37
	; GCN-NEXT: v_writelane_b32 v23, s14, 38			; GCN-NEXT: v_writelane_b32 v0, s14, 38
	; GCN-NEXT: v_writelane_b32 v23, s15, 39			; GCN-NEXT: v_writelane_b32 v0, s15, 39
	; GCN-NEXT: v_writelane_b32 v23, s16, 40			; GCN-NEXT: v_writelane_b32 v0, s16, 40
	; GCN-NEXT: v_writelane_b32 v23, s17, 41			; GCN-NEXT: v_writelane_b32 v0, s17, 41
	; GCN-NEXT: v_writelane_b32 v23, s18, 42			; GCN-NEXT: v_writelane_b32 v0, s18, 42
	; GCN-NEXT: v_writelane_b32 v23, s19, 43			; GCN-NEXT: v_writelane_b32 v0, s19, 43
	; GCN-NEXT: v_writelane_b32 v23, s20, 44			; GCN-NEXT: v_writelane_b32 v0, s20, 44
	; GCN-NEXT: v_writelane_b32 v23, s21, 45			; GCN-NEXT: v_writelane_b32 v0, s21, 45
	; GCN-NEXT: v_writelane_b32 v23, s22, 46			; GCN-NEXT: v_writelane_b32 v0, s22, 46
	; GCN-NEXT: v_writelane_b32 v23, s23, 47			; GCN-NEXT: v_writelane_b32 v0, s23, 47
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 48			; GCN-NEXT: v_writelane_b32 v0, s8, 48
	; GCN-NEXT: v_writelane_b32 v23, s9, 49			; GCN-NEXT: v_writelane_b32 v0, s9, 49
	; GCN-NEXT: v_writelane_b32 v23, s10, 50			; GCN-NEXT: v_writelane_b32 v0, s10, 50
	; GCN-NEXT: v_writelane_b32 v23, s11, 51			; GCN-NEXT: v_writelane_b32 v0, s11, 51
	; GCN-NEXT: v_writelane_b32 v23, s12, 52			; GCN-NEXT: v_writelane_b32 v0, s12, 52
	; GCN-NEXT: v_writelane_b32 v23, s13, 53			; GCN-NEXT: v_writelane_b32 v0, s13, 53
	; GCN-NEXT: v_writelane_b32 v23, s14, 54			; GCN-NEXT: v_writelane_b32 v0, s14, 54
	; GCN-NEXT: v_writelane_b32 v23, s15, 55			; GCN-NEXT: v_writelane_b32 v0, s15, 55
	; GCN-NEXT: v_writelane_b32 v23, s16, 56			; GCN-NEXT: v_writelane_b32 v0, s16, 56
	; GCN-NEXT: v_writelane_b32 v23, s17, 57			; GCN-NEXT: v_writelane_b32 v0, s17, 57
	; GCN-NEXT: v_writelane_b32 v23, s18, 58			; GCN-NEXT: v_writelane_b32 v0, s18, 58
	; GCN-NEXT: v_writelane_b32 v23, s19, 59			; GCN-NEXT: v_writelane_b32 v0, s19, 59
	; GCN-NEXT: v_writelane_b32 v23, s20, 60			; GCN-NEXT: v_writelane_b32 v0, s20, 60
	; GCN-NEXT: v_writelane_b32 v23, s21, 61			; GCN-NEXT: v_writelane_b32 v0, s21, 61
	; GCN-NEXT: v_writelane_b32 v23, s22, 62			; GCN-NEXT: v_writelane_b32 v0, s22, 62
	; GCN-NEXT: v_writelane_b32 v23, s23, 63			; GCN-NEXT: v_writelane_b32 v0, s23, 63
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[6:7]			; GCN-NEXT: ; def s[6:7]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_mov_b64 s[8:9], exec			; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: s_mov_b64 exec, 3
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: v_writelane_b32 v0, s6, 0			; GCN-NEXT: v_writelane_b32 v0, s6, 0
	; GCN-NEXT: v_writelane_b32 v0, s7, 1			; GCN-NEXT: v_writelane_b32 v0, s7, 1
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-NEXT: s_mov_b32 s5, 0			; GCN-NEXT: s_mov_b32 s5, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_cmp_lg_u32 s4, s5			; GCN-NEXT: s_cmp_lg_u32 s4, s5
	; GCN-NEXT: s_cbranch_scc1 .LBB0_2			; GCN-NEXT: s_cbranch_scc1 .LBB0_2
	; GCN-NEXT: ; %bb.1: ; %bb0			; GCN-NEXT: ; %bb.1: ; %bb0
	; GCN-NEXT: v_readlane_b32 s4, v23, 0			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s5, v23, 1			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s6, v23, 2			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s7, v23, 3			; GCN-NEXT: v_readlane_b32 s4, v1, 0
	; GCN-NEXT: v_readlane_b32 s8, v23, 4			; GCN-NEXT: v_readlane_b32 s5, v1, 1
	; GCN-NEXT: v_readlane_b32 s9, v23, 5			; GCN-NEXT: v_readlane_b32 s6, v1, 2
	; GCN-NEXT: v_readlane_b32 s10, v23, 6			; GCN-NEXT: v_readlane_b32 s7, v1, 3
	; GCN-NEXT: v_readlane_b32 s11, v23, 7			; GCN-NEXT: v_readlane_b32 s8, v1, 4
	; GCN-NEXT: v_readlane_b32 s12, v23, 8			; GCN-NEXT: v_readlane_b32 s9, v1, 5
	; GCN-NEXT: v_readlane_b32 s13, v23, 9			; GCN-NEXT: v_readlane_b32 s10, v1, 6
	; GCN-NEXT: v_readlane_b32 s14, v23, 10			; GCN-NEXT: v_readlane_b32 s11, v1, 7
	; GCN-NEXT: v_readlane_b32 s15, v23, 11			; GCN-NEXT: v_readlane_b32 s12, v1, 8
	; GCN-NEXT: v_readlane_b32 s16, v23, 12			; GCN-NEXT: v_readlane_b32 s13, v1, 9
	; GCN-NEXT: v_readlane_b32 s17, v23, 13			; GCN-NEXT: v_readlane_b32 s14, v1, 10
	; GCN-NEXT: v_readlane_b32 s18, v23, 14			; GCN-NEXT: v_readlane_b32 s15, v1, 11
	; GCN-NEXT: v_readlane_b32 s19, v23, 15			; GCN-NEXT: v_readlane_b32 s16, v1, 12
				; GCN-NEXT: v_readlane_b32 s17, v1, 13
				; GCN-NEXT: v_readlane_b32 s18, v1, 14
				; GCN-NEXT: v_readlane_b32 s19, v1, 15
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s4, v23, 16			; GCN-NEXT: v_readlane_b32 s4, v1, 16
	; GCN-NEXT: v_readlane_b32 s5, v23, 17			; GCN-NEXT: v_readlane_b32 s5, v1, 17
	; GCN-NEXT: v_readlane_b32 s6, v23, 18			; GCN-NEXT: v_readlane_b32 s6, v1, 18
	; GCN-NEXT: v_readlane_b32 s7, v23, 19			; GCN-NEXT: v_readlane_b32 s7, v1, 19
	; GCN-NEXT: v_readlane_b32 s8, v23, 20			; GCN-NEXT: v_readlane_b32 s8, v1, 20
	; GCN-NEXT: v_readlane_b32 s9, v23, 21			; GCN-NEXT: v_readlane_b32 s9, v1, 21
	; GCN-NEXT: v_readlane_b32 s10, v23, 22			; GCN-NEXT: v_readlane_b32 s10, v1, 22
	; GCN-NEXT: v_readlane_b32 s11, v23, 23			; GCN-NEXT: v_readlane_b32 s11, v1, 23
	; GCN-NEXT: v_readlane_b32 s12, v23, 24			; GCN-NEXT: v_readlane_b32 s12, v1, 24
	; GCN-NEXT: v_readlane_b32 s13, v23, 25			; GCN-NEXT: v_readlane_b32 s13, v1, 25
	; GCN-NEXT: v_readlane_b32 s14, v23, 26			; GCN-NEXT: v_readlane_b32 s14, v1, 26
	; GCN-NEXT: v_readlane_b32 s15, v23, 27			; GCN-NEXT: v_readlane_b32 s15, v1, 27
	; GCN-NEXT: v_readlane_b32 s16, v23, 28			; GCN-NEXT: v_readlane_b32 s16, v1, 28
	; GCN-NEXT: v_readlane_b32 s17, v23, 29			; GCN-NEXT: v_readlane_b32 s17, v1, 29
	; GCN-NEXT: v_readlane_b32 s18, v23, 30			; GCN-NEXT: v_readlane_b32 s18, v1, 30
	; GCN-NEXT: v_readlane_b32 s19, v23, 31			; GCN-NEXT: v_readlane_b32 s19, v1, 31
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s4, v23, 32			; GCN-NEXT: v_readlane_b32 s4, v1, 32
	; GCN-NEXT: v_readlane_b32 s5, v23, 33			; GCN-NEXT: v_readlane_b32 s5, v1, 33
	; GCN-NEXT: v_readlane_b32 s6, v23, 34			; GCN-NEXT: v_readlane_b32 s6, v1, 34
	; GCN-NEXT: v_readlane_b32 s7, v23, 35			; GCN-NEXT: v_readlane_b32 s7, v1, 35
	; GCN-NEXT: v_readlane_b32 s8, v23, 36			; GCN-NEXT: v_readlane_b32 s8, v1, 36
	; GCN-NEXT: v_readlane_b32 s9, v23, 37			; GCN-NEXT: v_readlane_b32 s9, v1, 37
	; GCN-NEXT: v_readlane_b32 s10, v23, 38			; GCN-NEXT: v_readlane_b32 s10, v1, 38
	; GCN-NEXT: v_readlane_b32 s11, v23, 39			; GCN-NEXT: v_readlane_b32 s11, v1, 39
	; GCN-NEXT: v_readlane_b32 s12, v23, 40			; GCN-NEXT: v_readlane_b32 s12, v1, 40
	; GCN-NEXT: v_readlane_b32 s13, v23, 41			; GCN-NEXT: v_readlane_b32 s13, v1, 41
	; GCN-NEXT: v_readlane_b32 s14, v23, 42			; GCN-NEXT: v_readlane_b32 s14, v1, 42
	; GCN-NEXT: v_readlane_b32 s15, v23, 43			; GCN-NEXT: v_readlane_b32 s15, v1, 43
	; GCN-NEXT: v_readlane_b32 s16, v23, 44			; GCN-NEXT: v_readlane_b32 s16, v1, 44
	; GCN-NEXT: v_readlane_b32 s17, v23, 45			; GCN-NEXT: v_readlane_b32 s17, v1, 45
	; GCN-NEXT: v_readlane_b32 s18, v23, 46			; GCN-NEXT: v_readlane_b32 s18, v1, 46
	; GCN-NEXT: v_readlane_b32 s19, v23, 47			; GCN-NEXT: v_readlane_b32 s19, v1, 47
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s8, v23, 48			; GCN-NEXT: v_readlane_b32 s8, v1, 48
	; GCN-NEXT: v_readlane_b32 s9, v23, 49			; GCN-NEXT: v_readlane_b32 s9, v1, 49
	; GCN-NEXT: v_readlane_b32 s10, v23, 50			; GCN-NEXT: v_readlane_b32 s10, v1, 50
	; GCN-NEXT: v_readlane_b32 s11, v23, 51			; GCN-NEXT: v_readlane_b32 s11, v1, 51
	; GCN-NEXT: v_readlane_b32 s12, v23, 52			; GCN-NEXT: v_readlane_b32 s12, v1, 52
	; GCN-NEXT: v_readlane_b32 s13, v23, 53			; GCN-NEXT: v_readlane_b32 s13, v1, 53
	; GCN-NEXT: v_readlane_b32 s14, v23, 54			; GCN-NEXT: v_readlane_b32 s14, v1, 54
	; GCN-NEXT: v_readlane_b32 s15, v23, 55			; GCN-NEXT: v_readlane_b32 s15, v1, 55
	; GCN-NEXT: v_readlane_b32 s16, v23, 56			; GCN-NEXT: v_readlane_b32 s16, v1, 56
	; GCN-NEXT: v_readlane_b32 s17, v23, 57			; GCN-NEXT: v_readlane_b32 s17, v1, 57
	; GCN-NEXT: v_readlane_b32 s18, v23, 58			; GCN-NEXT: v_readlane_b32 s18, v1, 58
	; GCN-NEXT: v_readlane_b32 s19, v23, 59			; GCN-NEXT: v_readlane_b32 s19, v1, 59
	; GCN-NEXT: v_readlane_b32 s20, v23, 60			; GCN-NEXT: v_readlane_b32 s20, v1, 60
	; GCN-NEXT: v_readlane_b32 s21, v23, 61			; GCN-NEXT: v_readlane_b32 s21, v1, 61
	; GCN-NEXT: v_readlane_b32 s22, v23, 62			; GCN-NEXT: v_readlane_b32 s22, v1, 62
	; GCN-NEXT: v_readlane_b32 s23, v23, 63			; GCN-NEXT: v_readlane_b32 s23, v1, 63
	; GCN-NEXT: s_mov_b64 s[6:7], exec
	; GCN-NEXT: s_mov_b64 exec, 3
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s4, v0, 0			; GCN-NEXT: v_readlane_b32 s4, v0, 0
	; GCN-NEXT: v_readlane_b32 s5, v0, 1			; GCN-NEXT: v_readlane_b32 s5, v0, 1
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[8:23]			; GCN-NEXT: ; use s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:5]			; GCN-NEXT: ; use s[4:5]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: .LBB0_2: ; %ret			; GCN-NEXT: .LBB0_2: ; %ret
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	Show All 28 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-partially-undef.mir

	Show All 14 Lines
	stack:			stack:
	- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }			- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr4			liveins: $sgpr4

	; CHECK-LABEL: name: sgpr_spill_s64_undef_high32			; CHECK-LABEL: name: sgpr_spill_s64_undef_high32
	; CHECK: liveins: $sgpr4, $vgpr0			; CHECK: liveins: $sgpr4
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
				; CHECK-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5
	SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)			SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)

	...			...

	---			---
	name: sgpr_spill_s64_undef_low32			name: sgpr_spill_s64_undef_low32
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	hasSpilledSGPRs: true			hasSpilledSGPRs: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'

	stack:			stack:
	- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }			- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr5			liveins: $sgpr5

	; CHECK-LABEL: name: sgpr_spill_s64_undef_low32			; CHECK-LABEL: name: sgpr_spill_s64_undef_low32
	; CHECK: liveins: $sgpr5, $vgpr0			; CHECK: liveins: $sgpr5
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
				; CHECK-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5
	SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)			SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)

	...			...

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx803 -O0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s		; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx803 -O0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

define void @child_function() #0 {		define void @child_function() #0 {
call void asm sideeffect "", "~{vcc}" () #0		call void asm sideeffect "", "~{vcc}" () #0
ret void		ret void
}		}

		; SGPRs are now spilled into virtual VGPRs and regalloc takes care of finding
		; physical VGPRs and this test shouldn't take the high regsiter for spill lanes.
; GCN-LABEL: {{^}}spill_sgpr_with_no_lower_vgpr_available:		; GCN-LABEL: {{^}}spill_sgpr_with_no_lower_vgpr_available:
; GCN: buffer_store_dword v255, off, s[0:3], s32		; GCN: buffer_store_dword [[LANE_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
		; GCN-NOT: buffer_store_dword v255, off, s[0:3], s32
; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: v_writelane_b32 v255, s30, 0		; GCN: v_writelane_b32 [[LANE_VGPR]], s30, 0
; GCN: v_writelane_b32 v255, s31, 1		; GCN: v_writelane_b32 [[LANE_VGPR]], s31, 1
		; GCN-NOT: v_writelane_b32 v255, s30, 0
		; GCN-NOT: v_writelane_b32 v255, s31, 1
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]
; GCN: v_readlane_b32 s31, v255, 1		; GCN: v_readlane_b32 s31, [[LANE_VGPR]], 1
; GCN: v_readlane_b32 s30, v255, 0		; GCN: v_readlane_b32 s30, [[LANE_VGPR]], 0
		; GCN-NOT: v_readlane_b32 s31, v255, 1
		; GCN-NOT: v_readlane_b32 s30, v255, 0
; GCN: s_mov_b32 s33, [[TMP_SGPR]]		; GCN: s_mov_b32 s33, [[TMP_SGPR]]
; GCN: ; NumVgprs: 256		; GCN: ; NumVgprs: 255

define void @spill_sgpr_with_no_lower_vgpr_available() #0 {		define void @spill_sgpr_with_no_lower_vgpr_available() #0 {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
Show All 21 Lines	define void @spill_sgpr_with_no_lower_vgpr_available() #0 {
,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}		,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}
,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}		,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}
,~{v250},~{v251},~{v252},~{v253},~{v254}" () #0		,~{v250},~{v251},~{v252},~{v253},~{v254}" () #0
call void @child_function()		call void @child_function()
ret void		ret void
}		}

; GCN-LABEL: {{^}}spill_to_lowest_available_vgpr:		; GCN-LABEL: {{^}}spill_to_lowest_available_vgpr:
; GCN: buffer_store_dword v254, off, s[0:3], s32		; GCN: buffer_store_dword [[LANE_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
		; GCN-NOT: buffer_store_dword v254, off, s[0:3], s32
; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: v_writelane_b32 v254, s30, 0		; GCN: v_writelane_b32 [[LANE_VGPR]], s30, 0
; GCN: v_writelane_b32 v254, s31, 1		; GCN: v_writelane_b32 [[LANE_VGPR]], s31, 1
		; GCN-NOT: v_writelane_b32 v254, s30, 0
		; GCN-NOT: v_writelane_b32 v254, s31, 1
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]
; GCN: v_readlane_b32 s31, v254, 1		; GCN: v_readlane_b32 s31, [[LANE_VGPR]], 1
; GCN: v_readlane_b32 s30, v254, 0		; GCN: v_readlane_b32 s30, [[LANE_VGPR]], 0
		; GCN-NOT: v_readlane_b32 s31, v254, 1
		; GCN-NOT: v_readlane_b32 s30, v254, 0
; GCN: s_mov_b32 s33, [[TMP_SGPR]]		; GCN: s_mov_b32 s33, [[TMP_SGPR]]

define void @spill_to_lowest_available_vgpr() #0 {		define void @spill_to_lowest_available_vgpr() #0 {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
Show All 22 Lines	define void @spill_to_lowest_available_vgpr() #0 {
,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}		,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}
,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}		,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}
,~{v250},~{v251},~{v252},~{v253}" () #0		,~{v250},~{v251},~{v252},~{v253}" () #0
call void @child_function()		call void @child_function()
ret void		ret void
}		}

; GCN-LABEL: {{^}}spill_sgpr_with_sgpr_uses:		; GCN-LABEL: {{^}}spill_sgpr_with_sgpr_uses:
		; GCN: buffer_store_dword [[LANE_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
; GCN-NOT: buffer_store_dword v255, off, s[0:3], s32		; GCN-NOT: buffer_store_dword v255, off, s[0:3], s32
		; GCN-NOT: buffer_store_dword v254, off, s[0:3], s32
; GCN: ; def s4		; GCN: ; def s4
; GCN: v_writelane_b32 v254, s4, 0		; GCN: v_writelane_b32 [[LANE_VGPR]], s4, 0
; GCN: v_readlane_b32 s4, v254, 0		; GCN: v_readlane_b32 s4, [[LANE_VGPR]], 0
		; GCN-NOT: v_writelane_b32 v254, s4, 0
		; GCN-NOT: v_readlane_b32 s4, v254, 0
; GCN: ; use s4		; GCN: ; use s4

define void @spill_sgpr_with_sgpr_uses() #0 {		define void @spill_sgpr_with_sgpr_uses() #0 {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	define void @spill_sgpr_with_tail_call() #0 {
ret void		ret void
}		}

; Special case where all registers are explicitly clobbered in the function and		; Special case where all registers are explicitly clobbered in the function and
; we have no VGPR to allocate for SGPR spills. We are forced to spill to memory.		; we have no VGPR to allocate for SGPR spills. We are forced to spill to memory.

; GCN-LABEL: {{^}}spill_sgpr_no_free_vgpr:		; GCN-LABEL: {{^}}spill_sgpr_no_free_vgpr:
; GCN: v_writelane_b32 [[A:v[0-9]+]], s34, 0		; GCN: v_writelane_b32 [[A:v[0-9]+]], s34, 0
; GCN: buffer_store_dword [[A]], off, s[0:3], s32		; GCN: v_writelane_b32 [[A]], s35, 1
; GCN: v_writelane_b32 [[B:v[0-9]+]], s35, 0		; GCN: v_writelane_b32 [[A]], s36, 2
; GCN: buffer_store_dword [[B]], off, s[0:3], s32		; GCN: v_writelane_b32 [[A]], s37, 3
; GCN: v_writelane_b32 [[C:v[0-9]+]], s36, 0		; GCN: buffer_store_dword [[A]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
; GCN: buffer_store_dword [[C]], off, s[0:3], s32		; GCN-NOT: v_writelane_b32 v{{[0-9]+}}, s35, 0
; GCN: v_writelane_b32 [[D:v[0-9]+]], s37, 0		; GCN-NOT: v_writelane_b32 v{{[0-9]+}}, s36, 0
; GCN: buffer_store_dword [[D]], off, s[0:3], s32		; GCN-NOT: v_writelane_b32 v{{[0-9]+}}, s37, 0
; GCN: #ASMEND		; GCN: #ASMEND
; GCN: buffer_load_dword [[E:v[0-9]+]]		; GCN-NOT: v_readlane_b32 s37, v{{[0-9]+}}, 0
; GCN: v_readlane_b32 s37, [[E]], 0		; GCN-NOT: v_readlane_b32 s36, v{{[0-9]+}}, 0
; GCN: buffer_load_dword [[F:v[0-9]+]]		; GCN-NOT: v_readlane_b32 s35, v{{[0-9]+}}, 0
; GCN: v_readlane_b32 s36, [[F]], 0		; GCN: buffer_load_dword [[B:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[IDX_OFF]] ; 4-byte Folded Reload
; GCN: buffer_load_dword [[G:v[0-9]+]]		; GCN: v_readlane_b32 s37, [[B]], 3
; GCN: v_readlane_b32 s35, [[G]], 0		; GCN: v_readlane_b32 s36, [[B]], 2
; GCN: buffer_load_dword [[H:v[0-9]+]]		; GCN: v_readlane_b32 s35, [[B]], 1
; GCN: v_readlane_b32 s34, [[H]], 0		; GCN: v_readlane_b32 s34, [[B]], 0

define void @spill_sgpr_no_free_vgpr(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {		define void @spill_sgpr_no_free_vgpr(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {
%a = load <4 x i32>, <4 x i32> addrspace(1)* %in		%a = load <4 x i32>, <4 x i32> addrspace(1)* %in
call void asm sideeffect "",		call void asm sideeffect "",
"~{v6},~{v7},~{v8},~{v9}		"~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}		,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}		,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	define internal void @child_function_ipra() #0 {
,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229}		,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229}
,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}		,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}
,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}		,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}
,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}" () #0		,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}" () #0
ret void		ret void
}		}

; GCN-LABEL: {{^}}spill_sgpr_no_free_vgpr_ipra:		; GCN-LABEL: {{^}}spill_sgpr_no_free_vgpr_ipra:
; GCN: v_writelane_b32 v0, s30, 0		; GCN: v_writelane_b32 [[A:v[0-9]+]], s30, 0
; GCN: buffer_store_dword v0, off		; GCN: v_writelane_b32 [[A]], s31, 1
; GCN: v_writelane_b32 v0, s31, 0		; GCN: buffer_store_dword [[A]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
; GCN: buffer_store_dword v0, off		; GCN-NOT: v_writelane_b32 v{{[0-9]+}}, s31, 0
; GCN: swappc		; GCN: swappc
; GCN: buffer_load_dword v0, off		; GCN-NOT: v_readlane_b32 s31, v{{[0-9]+}}, 0
; GCN: v_readlane_b32 s31, v0, 0		; GCN: buffer_load_dword [[B:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[IDX_OFF]] ; 4-byte Folded Reload
; GCN: buffer_load_dword v0, off		; GCN: v_readlane_b32 s31, [[B]], 1
; GCN: v_readlane_b32 s30, v0, 0		; GCN: v_readlane_b32 s30, [[B]], 0
define void @spill_sgpr_no_free_vgpr_ipra() #0 {		define void @spill_sgpr_no_free_vgpr_ipra() #0 {
call void @child_function_ipra()		call void @child_function_ipra()
ret void		ret void
}		}

define internal void @child_function_ipra_tail_call() #0 {		define internal void @child_function_ipra_tail_call() #0 {
call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; ALL-LABEL: {{^}}test:			; ALL-LABEL: {{^}}test:
	; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0			; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0
	; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000			; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000

	; Make sure we are handling hazards correctly.			; Make sure we are handling hazards correctly.
	; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:4			; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:4
				; SGPR: v_mov_b32_e32 v0, s100
	; SGPR-NEXT: s_waitcnt vmcnt(0)			; SGPR-NEXT: s_waitcnt vmcnt(0)
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 0			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 0
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 1			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 1
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 2			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 2
	; SGPR-NEXT: v_readlane_b32 s[[HI:[0-9]+]], [[VHI]], 3			; SGPR-NEXT: v_readlane_b32 s[[HI:[0-9]+]], [[VHI]], 3
	; SGPR-NEXT: buffer_load_dword [[VHI]], off, s[96:99], 0			; SGPR-NEXT: s_nop 4
	; SGPR-NEXT: s_waitcnt vmcnt(0)
	; SGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SGPR-NEXT: s_nop 1
	; SGPR-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SGPR-NEXT: buffer_store_dword v0, off, s[0:3], 0

	; ALL: s_endpgm			; ALL: s_endpgm
	define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	call void asm sideeffect "", "~{s[8:15]}" ()			call void asm sideeffect "", "~{s[8:15]}" ()
	call void asm sideeffect "", "~{s[16:23]}" ()			call void asm sideeffect "", "~{s[16:23]}" ()
	call void asm sideeffect "", "~{s[24:31]}" ()			call void asm sideeffect "", "~{s[24:31]}" ()
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; GCN: v_writelane_b32 [[CSRV_1]], s33, 0			; GCN: v_writelane_b32 [[CSRV_1]], s33, 0
	; GCN-DAG: s_addk_i32 s32, 0x800			; GCN-DAG: s_addk_i32 s32, 0x800

	; GCN-DAG: s_getpc_b64 s[4:5]			; GCN-DAG: s_getpc_b64 s[4:5]
	; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4			; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
	; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12			; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12

	; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0			; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0
	; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1			; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1


	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

	; GCN: s_getpc_b64 s[4:5]			; GCN: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12

	; GCN-DAG: v_readlane_b32 s30, [[CSRV]], 0			; GCN-DAG: v_readlane_b32 s30, [[CSRV]], 0
	; GCN-DAG: v_readlane_b32 s31, [[CSRV]], 1			; GCN-DAG: v_readlane_b32 s31, [[CSRV]], 1

	▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}spill_csr_s5_copy:			; GCN-LABEL: {{^}}spill_csr_s5_copy:
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: v_writelane_b32 v41, s33, 0			; GCN: v_writelane_b32 v40, s33, 0
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9
	; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}			; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}

	; GCN: v_readlane_b32 s33, v41, 0			; GCN: v_readlane_b32 s33, v40, 0
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @spill_csr_s5_copy() #0 {			define void @spill_csr_s5_copy() #0 {
	bb:			bb:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%tmp = tail call i64 @func() #1			%tmp = tail call i64 @func() #1
	%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp			%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp
	%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4			%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/spill-reg-tuple-super-reg-use.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=si-lower-sgpr-spills,prologepilog,machine-cp -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s		# RUN: llc -march=amdgcn -mcpu=gfx900 -start-before=si-lower-sgpr-spills -stop-after=prologepilog -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

# Make sure the initial first $sgpr1 = COPY $sgpr2 copy is not deleted		# Make sure the initial first $sgpr1 = COPY $sgpr2 copy is not deleted
# by the copy propagation after lowering the spill.		# by the copy propagation after lowering the spill.

---		---
name: spill_sgpr128_use_subreg		name: spill_sgpr128_use_subreg
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
Show All 10 Lines	bb.0:

; GCN-LABEL: name: spill_sgpr128_use_subreg		; GCN-LABEL: name: spill_sgpr128_use_subreg
; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: $sgpr8_sgpr9 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr8_sgpr9 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2		; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: renamable $sgpr8 = COPY killed renamable $sgpr1		; GCN-NEXT: dead renamable $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
		; GCN-NEXT: renamable $sgpr8 = COPY renamable $sgpr1
; GCN-NEXT: $sgpr0_sgpr1 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr0_sgpr1 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
; GCN-NEXT: S_ENDPGM 0, implicit $sgpr8		; GCN-NEXT: S_ENDPGM 0, implicit $sgpr8
renamable $sgpr1 = COPY $sgpr2		renamable $sgpr1 = COPY $sgpr2
SI_SPILL_S128_SAVE renamable $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_S128_SAVE renamable $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)
renamable $sgpr8 = COPY killed renamable $sgpr1		renamable $sgpr8 = COPY killed renamable $sgpr1
S_ENDPGM 0, implicit $sgpr8		S_ENDPGM 0, implicit $sgpr8
Show All 16 Lines	bb.0:

; GCN-LABEL: name: spill_sgpr128_use_kill		; GCN-LABEL: name: spill_sgpr128_use_kill
; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: $sgpr8_sgpr9 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr8_sgpr9 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2		; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr3, 3, $vgpr0, implicit killed $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
		; GCN-NEXT: dead renamable $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $sgpr0_sgpr1 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr0_sgpr1 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
; GCN-NEXT: S_ENDPGM 0		; GCN-NEXT: S_ENDPGM 0
renamable $sgpr1 = COPY $sgpr2		renamable $sgpr1 = COPY $sgpr2
SI_SPILL_S128_SAVE renamable killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_S128_SAVE renamable killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0		S_ENDPGM 0
...		...
Show All 10 Lines

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7

; GCN-LABEL: name: spill_vgpr128_use_subreg		; GCN-LABEL: name: spill_vgpr128_use_subreg
; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2		; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)
; GCN-NEXT: renamable $vgpr8 = COPY killed renamable $vgpr1		; GCN-NEXT: renamable $vgpr8 = COPY $vgpr2, implicit $exec
; GCN-NEXT: S_ENDPGM 0, implicit $vgpr8		; GCN-NEXT: S_ENDPGM 0, implicit $vgpr8
renamable $vgpr1 = COPY $vgpr2		renamable $vgpr1 = COPY $vgpr2
SI_SPILL_V128_SAVE renamable $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_V128_SAVE renamable $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)
renamable $vgpr8 = COPY killed renamable $vgpr1		renamable $vgpr8 = COPY killed renamable $vgpr1
S_ENDPGM 0, implicit $vgpr8		S_ENDPGM 0, implicit $vgpr8
...		...

---		---
name: spill_vgpr128_use_kill		name: spill_vgpr128_use_kill
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
scratchRSrcReg: $sgpr100_sgpr101_sgpr102_sgpr103		scratchRSrcReg: $sgpr100_sgpr101_sgpr102_sgpr103
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32

stack:		stack:
- { id: 0, type: spill-slot, size: 16, alignment: 4 }		- { id: 0, type: spill-slot, size: 16, alignment: 4 }

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7

; GCN-LABEL: name: spill_vgpr128_use_kill		; GCN-LABEL: name: spill_vgpr128_use_kill
; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2		; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, 0, implicit $exec, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)
; GCN-NEXT: S_ENDPGM 0		; GCN-NEXT: S_ENDPGM 0
renamable $vgpr1 = COPY $vgpr2		renamable $vgpr1 = COPY $vgpr2
SI_SPILL_V128_SAVE renamable killed $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_V128_SAVE renamable killed $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0		S_ENDPGM 0
...		...

llvm/test/CodeGen/AMDGPU/spill-sgpr-csr-live-ins.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck %s

	---			---
	name: spill_csr_sgpr_argument			name: spill_csr_sgpr_argument
	tracksRegLiveness: true			tracksRegLiveness: true
	liveins:			liveins:
	- { reg: '$sgpr50' }			- { reg: '$sgpr50' }
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr50			liveins: $sgpr50
	; CHECK-LABEL: name: spill_csr_sgpr_argument			; CHECK-LABEL: name: spill_csr_sgpr_argument
	; CHECK: liveins: $sgpr50, $vgpr0			; CHECK: liveins: $sgpr50
	; CHECK: $vgpr0 = V_WRITELANE_B32 $sgpr50, 0, $vgpr0			; CHECK-NEXT: {{ $}}
	; CHECK: S_NOP 0, implicit $sgpr50			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; CHECK: $sgpr50 = S_MOV_B32 0			; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr50, 0, [[V_WRITELANE_B32_]]
				; CHECK-NEXT: S_NOP 0, implicit $sgpr50
				; CHECK-NEXT: $sgpr50 = S_MOV_B32 0
	S_NOP 0, implicit $sgpr50			S_NOP 0, implicit $sgpr50
	$sgpr50 = S_MOV_B32 0			$sgpr50 = S_MOV_B32 0

	...			...

llvm/test/CodeGen/AMDGPU/spill-sgpr-stack-no-sgpr.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10 %s

	; Spill an SGPR to scratch without having spare SGPRs available to save exec			; The test was originally written to spill an SGPR to scratch without having spare SGPRs available to save exec.
				; This scenario no longer exists when we enabled SGPR spill into virtual VGPRs.

	define amdgpu_kernel void @test() #1 {			define amdgpu_kernel void @test() #1 {
	; GFX10-LABEL: test:			; GFX10-LABEL: test:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GFX10-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
	; GFX10-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1			; GFX10-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1
	; GFX10-NEXT: s_mov_b32 s10, -1			; GFX10-NEXT: s_mov_b32 s10, -1
	; GFX10-NEXT: s_mov_b32 s11, 0x31e16000			; GFX10-NEXT: s_mov_b32 s11, 0x31e16000
	; GFX10-NEXT: s_add_u32 s8, s8, s1			; GFX10-NEXT: s_add_u32 s8, s8, s1
	; GFX10-NEXT: s_addc_u32 s9, s9, 0			; GFX10-NEXT: s_addc_u32 s9, s9, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s[0:7]			; GFX10-NEXT: ; def s[0:7]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s[8:12]			; GFX10-NEXT: ; def s[8:12]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_not_b64 exec, exec			; GFX10-NOT: s_not_b64 exec, exec
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0			; GFX10-NEXT: ; implicit-def: $vgpr0
	; GFX10-NEXT: v_writelane_b32 v0, s8, 0			; GFX10-NEXT: v_writelane_b32 v0, s8, 0
	; GFX10-NEXT: v_writelane_b32 v0, s9, 1			; GFX10-NEXT: v_writelane_b32 v0, s9, 1
	; GFX10-NEXT: v_writelane_b32 v0, s10, 2			; GFX10-NEXT: v_writelane_b32 v0, s10, 2
	; GFX10-NEXT: v_writelane_b32 v0, s11, 3			; GFX10-NEXT: v_writelane_b32 v0, s11, 3
	; GFX10-NEXT: v_writelane_b32 v0, s12, 4			; GFX10-NEXT: v_writelane_b32 v0, s12, 4
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_not_b64 exec, exec
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_not_b64 exec, exec
	; GFX10-NEXT: buffer_load_dword v0, off, s[8:11], 0
	; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_not_b64 exec, exec
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s[0:7]			; GFX10-NEXT: ; use s[0:7]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_mov_b64 s[6:7], exec
	; GFX10-NEXT: s_mov_b64 exec, 31
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; GFX10-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_readlane_b32 s0, v0, 0			; GFX10-NEXT: v_readlane_b32 s0, v0, 0
	; GFX10-NEXT: v_readlane_b32 s1, v0, 1			; GFX10-NEXT: v_readlane_b32 s1, v0, 1
	; GFX10-NEXT: v_readlane_b32 s2, v0, 2			; GFX10-NEXT: v_readlane_b32 s2, v0, 2
	; GFX10-NEXT: v_readlane_b32 s3, v0, 3			; GFX10-NEXT: v_readlane_b32 s3, v0, 3
	; GFX10-NEXT: v_readlane_b32 s4, v0, 4			; GFX10-NEXT: v_readlane_b32 s4, v0, 4
	; GFX10-NEXT: buffer_load_dword v0, off, s[8:11], 0
	; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b64 exec, s[6:7]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s[0:4]			; GFX10-NEXT: ; use s[0:4]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	%wide.sgpr0 = call <8 x i32> asm sideeffect "; def $0", "={s[0:7]}" () #0			%wide.sgpr0 = call <8 x i32> asm sideeffect "; def $0", "={s[0:7]}" () #0
	%wide.sgpr2 = call <5 x i32> asm sideeffect "; def $0", "={s[8:12]}" () #0			%wide.sgpr2 = call <5 x i32> asm sideeffect "; def $0", "={s[8:12]}" () #0
	call void asm sideeffect "", "~{v[0:7]}" () #0			call void asm sideeffect "", "~{v[0:7]}" () #0
	call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr0) #0			call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr0) #0
	call void asm sideeffect "; use $0", "s"(<5 x i32> %wide.sgpr2) #0			call void asm sideeffect "; use $0", "s"(<5 x i32> %wide.sgpr2) #0
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind "amdgpu-num-sgpr"="16" "amdgpu-num-vgpr"="8" }			attributes #1 = { nounwind "amdgpu-num-sgpr"="16" "amdgpu-num-vgpr"="8" }

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -run-pass=si-lower-sgpr-spills -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				# A simple SGPR spill. Implicit def for lane VGPR should be inserted just before the spill instruction.
				---
				name: sgpr32_spill
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr30_sgpr31, $sgpr10
				; GCN-LABEL: name: sgpr32_spill
				; GCN: liveins: $sgpr30_sgpr31, $sgpr10
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				arsenmUnsubmitted Not Done Reply Inline Actions The test checks seem to not capture that these operands are tied arsenm: The test checks seem to not capture that these operands are tied
				cdevadasAuthorUnsubmitted Done Reply Inline Actions The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to show the tied operand. It's the simplest case. cdevadas: The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to…
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_]], 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31
				S_NOP 0
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31
				...

				# Needed an additional virtual lane register as the lanes of current register are fully occupied while spilling a wide SGPR tuple.
				# There must be two implicit def for the two lane VGPRs.

				---
				name: sgpr_spill_lane_crossover
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				- { id: 1, type: spill-slot, size: 128, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr30_sgpr31, $sgpr10, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-LABEL: name: sgpr_spill_lane_crossover
				; GCN: liveins: $sgpr10, $sgpr64, $sgpr65, $sgpr66, $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, $sgpr72, $sgpr73, $sgpr74, $sgpr75, $sgpr76, $sgpr77, $sgpr78, $sgpr79, $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, $sgpr85, $sgpr86, $sgpr87, $sgpr88, $sgpr89, $sgpr90, $sgpr91, $sgpr92, $sgpr93, $sgpr94, $sgpr95, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				cdevadasAuthorUnsubmitted Done Reply Inline Actions This test is already hand-modified to check the tied operands. cdevadas: This test is already hand-modified to check the tied operands.
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr64, 0, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr65, 1, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr66, 2, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr67, 3, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr68, 4, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr69, 5, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr70, 6, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr71, 7, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr72, 8, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr73, 9, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr74, 10, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr75, 11, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr76, 12, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr77, 13, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr78, 14, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr79, 15, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr80, 16, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr81, 17, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr82, 18, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr83, 19, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr84, 20, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr85, 21, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr86, 22, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr87, 23, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr88, 24, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr89, 25, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr90, 26, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr91, 27, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr92, 28, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr93, 29, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr94, 30, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr95, 31, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 32, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr64, 33, [[V_WRITELANE_B32_1]], implicit-def $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr65, 34, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr66, 35, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr67, 36, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr68, 37, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr69, 38, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr70, 39, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr71, 40, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr72, 41, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr73, 42, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr74, 43, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr75, 44, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr76, 45, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr77, 46, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr78, 47, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr79, 48, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr80, 49, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr81, 50, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr82, 51, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr83, 52, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr84, 53, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr85, 54, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr86, 55, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr87, 56, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr88, 57, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr89, 58, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr90, 59, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr91, 60, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr92, 61, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr93, 62, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr94, 63, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_2]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr95, 0, [[V_WRITELANE_B32_2]], implicit killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: $sgpr64 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 33, implicit-def $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: $sgpr65 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 34
				; GCN-NEXT: $sgpr66 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 35
				; GCN-NEXT: $sgpr67 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 36
				; GCN-NEXT: $sgpr68 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 37
				; GCN-NEXT: $sgpr69 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 38
				; GCN-NEXT: $sgpr70 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 39
				; GCN-NEXT: $sgpr71 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 40
				; GCN-NEXT: $sgpr72 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 41
				; GCN-NEXT: $sgpr73 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 42
				; GCN-NEXT: $sgpr74 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 43
				; GCN-NEXT: $sgpr75 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 44
				; GCN-NEXT: $sgpr76 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 45
				; GCN-NEXT: $sgpr77 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 46
				; GCN-NEXT: $sgpr78 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 47
				; GCN-NEXT: $sgpr79 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 48
				; GCN-NEXT: $sgpr80 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 49
				; GCN-NEXT: $sgpr81 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 50
				; GCN-NEXT: $sgpr82 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 51
				; GCN-NEXT: $sgpr83 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 52
				; GCN-NEXT: $sgpr84 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 53
				; GCN-NEXT: $sgpr85 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 54
				; GCN-NEXT: $sgpr86 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 55
				; GCN-NEXT: $sgpr87 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 56
				; GCN-NEXT: $sgpr88 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 57
				; GCN-NEXT: $sgpr89 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 58
				; GCN-NEXT: $sgpr90 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 59
				; GCN-NEXT: $sgpr91 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 60
				; GCN-NEXT: $sgpr92 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 61
				; GCN-NEXT: $sgpr93 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 62
				; GCN-NEXT: $sgpr94 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 63
				; GCN-NEXT: $sgpr95 = V_READLANE_B32 [[V_WRITELANE_B32_2]], 0
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 32
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31
				S_NOP 0
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				SI_SPILL_S1024_SAVE killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_NOP 0
				renamable $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95 = SI_SPILL_S1024_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31
				...

				# The implicit def for the lane VGPR should be inserted at the common dominator block (the entry block here).

				---
				name: lane_vgpr_implicit_def_at_common_dominator_block
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: lane_vgpr_implicit_def_at_common_dominator_block
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				arsenmUnsubmitted Not Done Reply Inline Actions Needs a case where the insert block has no terminators arsenm: Needs a case where the insert block has no terminators
				cdevadasAuthorUnsubmitted Done Reply Inline Actions I couldn't write one successfully. Will try some unstructured flow to force one. cdevadas: I couldn't write one successfully. Will try some unstructured flow to force one.
				cdevadasAuthorUnsubmitted Done Reply Inline Actions I don't think such a case exists. A fall-through block will have only one successor and that becomes the nearest dominator for its children. It would be true even for any unstructured flow. cdevadas: I don't think such a case exists. A fall-through block will have only one successor and that…
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 10
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 20
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				bb.0:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				S_NOP 0
				S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				bb.1:
				liveins: $sgpr10, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 10
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_BRANCH %bb.3
				bb.2:
				liveins: $sgpr10, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 20
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_BRANCH %bb.3
				bb.3:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				...

				# The common dominator block is visited only at the end. The insertion point was initially identified to the
				# terminator instruction in the dominator block which later becomes the point where a spill get inserted in the same block.

				---
				name: dominator_block_follows_the_successors_bbs
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: dominator_block_follows_the_successors_bbs
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: successors: %bb.2(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 %0, 0
				; GCN-NEXT: $sgpr10 = S_ADD_I32 $sgpr10, 15, implicit-def dead $scc
				; GCN-NEXT: S_BRANCH %bb.2
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 %0, 0
				; GCN-NEXT: $sgpr10 = S_ADD_I32 $sgpr10, 20, implicit-def dead $scc
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 10
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				; GCN-NEXT: S_BRANCH %bb.1
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.4:
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				bb.0:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				S_NOP 0
				S_BRANCH %bb.3
				bb.1:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				$sgpr10 = S_ADD_I32 $sgpr10, 15, implicit-def dead $scc
				S_BRANCH %bb.2
				bb.2:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				$sgpr10 = S_ADD_I32 $sgpr10, 20, implicit-def dead $scc
				S_BRANCH %bb.3
				bb.3:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 10
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				S_BRANCH %bb.1
				bb.4:
				liveins: $sgpr10, $sgpr30_sgpr31
				S_NOP 0
				S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				...

llvm/test/CodeGen/AMDGPU/spill-writelane-vgprs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s

	; Callee must preserve the VGPR modified by writelane even if it is marked Caller-saved.			; Callee must preserve the VGPR modified by writelane even if it is marked Caller-saved.

	declare i32 @llvm.amdgcn.writelane(i32, i32, i32)			declare i32 @llvm.amdgcn.writelane(i32, i32, i32)

	define void @sgpr_spill_writelane() {			define void @sgpr_spill_writelane() {
	; GCN-LABEL: sgpr_spill_writelane:			; GCN-LABEL: sgpr_spill_writelane:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
				; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: v_writelane_b32 v0, s35, 0			; GCN-NEXT: v_writelane_b32 v0, s35, 0
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s35, v0, 0			; GCN-NEXT: v_readlane_b32 s35, v0, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill192.mir

Show All 26 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9 = SI_SPILL_S192_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s192) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9 = SI_SPILL_S192_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s192) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-LABEL: name: spill_restore_sgpr192		; EXPANDED-LABEL: name: spill_restore_sgpr192
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr9, 5, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr9, 5, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_192		S_NOP 0, implicit-def %0:sgpr_192
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill224.mir

Show All 24 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10 = SI_SPILL_S224_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s224) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10 = SI_SPILL_S224_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s224) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-LABEL: name: spill_restore_sgpr224		; EXPANDED-LABEL: name: spill_restore_sgpr224
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr9, 5, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 6, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr9, 5, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 6, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4		; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5
; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 6
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_224		S_NOP 0, implicit-def %0:sgpr_224
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

	Show All 14 Lines

	define amdgpu_gfx float @caller(float %arg0) {			define amdgpu_gfx float @caller(float %arg0) {
	; GCN-LABEL: caller:			; GCN-LABEL: caller:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1			; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[34:35]			; GCN-NEXT: s_mov_b64 exec, s[34:35]
	; GCN-NEXT: v_writelane_b32 v1, s4, 0			; GCN-NEXT: ; implicit-def: $vgpr1
	; GCN-NEXT: s_mov_b32 s36, s33			; GCN-NEXT: s_mov_b32 s36, s33
				; GCN-NEXT: v_writelane_b32 v1, s4, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v1, s30, 1			; GCN-NEXT: v_writelane_b32 v1, s30, 1
	; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0			; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
	; GCN-NEXT: s_mov_b32 s4, 2.0			; GCN-NEXT: s_mov_b32 s4, 2.0
	; GCN-NEXT: v_writelane_b32 v1, s31, 2			; GCN-NEXT: v_writelane_b32 v1, s31, 2
	; GCN-NEXT: s_getpc_b64 s[34:35]			; GCN-NEXT: s_getpc_b64 s[34:35]
	; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4			; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4
	Show All 16 Lines

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s
	; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s			; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s

	define hidden void @widget() {			define hidden void @widget() {
	; GCN-LABEL: widget:			; GCN-LABEL: widget:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v41, s33, 0			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: flat_load_dword v0, v[0:1]			; GCN-NEXT: flat_load_dword v0, v[0:1]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 21, v0			; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 21, v0
	; GCN-NEXT: s_cbranch_vccz .LBB0_3			; GCN-NEXT: s_cbranch_vccz .LBB0_3
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; SI-OPT: bb18:			; SI-OPT: bb18:
	; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4			; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4
	; SI-OPT-NEXT: br label [[BB2]]			; SI-OPT-NEXT: br label [[BB2]]
	;			;
	; GCN-LABEL: blam:			; GCN-LABEL: blam:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v46, s33, 0			; GCN-NEXT: v_writelane_b32 v45, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v0, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v0, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v0, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v0, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v0, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s38, 6			; GCN-NEXT: v_writelane_b32 v0, s37, 5
	; GCN-NEXT: v_writelane_b32 v40, s39, 7			; GCN-NEXT: v_writelane_b32 v0, s38, 6
	; GCN-NEXT: v_writelane_b32 v40, s40, 8			; GCN-NEXT: v_writelane_b32 v0, s39, 7
	; GCN-NEXT: v_writelane_b32 v40, s41, 9			; GCN-NEXT: v_writelane_b32 v0, s40, 8
	; GCN-NEXT: v_writelane_b32 v40, s42, 10			; GCN-NEXT: v_writelane_b32 v0, s41, 9
	; GCN-NEXT: v_writelane_b32 v40, s43, 11			; GCN-NEXT: v_writelane_b32 v0, s42, 10
	; GCN-NEXT: v_writelane_b32 v40, s44, 12			; GCN-NEXT: v_writelane_b32 v0, s43, 11
	; GCN-NEXT: v_writelane_b32 v40, s45, 13			; GCN-NEXT: v_writelane_b32 v0, s44, 12
	; GCN-NEXT: v_writelane_b32 v40, s46, 14			; GCN-NEXT: v_writelane_b32 v0, s45, 13
	; GCN-NEXT: v_writelane_b32 v40, s48, 15			; GCN-NEXT: v_writelane_b32 v0, s46, 14
	; GCN-NEXT: v_writelane_b32 v40, s49, 16			; GCN-NEXT: v_writelane_b32 v0, s48, 15
	; GCN-NEXT: v_mov_b32_e32 v41, v31			; GCN-NEXT: v_writelane_b32 v0, s49, 16
				; GCN-NEXT: v_mov_b32_e32 v40, v31
	; GCN-NEXT: s_mov_b32 s44, s14			; GCN-NEXT: s_mov_b32 s44, s14
	; GCN-NEXT: s_mov_b32 s45, s13			; GCN-NEXT: s_mov_b32 s45, s13
	; GCN-NEXT: s_mov_b32 s46, s12			; GCN-NEXT: s_mov_b32 s46, s12
	; GCN-NEXT: s_mov_b64 s[36:37], s[10:11]			; GCN-NEXT: s_mov_b64 s[36:37], s[10:11]
	; GCN-NEXT: s_mov_b64 s[38:39], s[8:9]			; GCN-NEXT: s_mov_b64 s[38:39], s[8:9]
	; GCN-NEXT: s_mov_b64 s[40:41], s[6:7]			; GCN-NEXT: s_mov_b64 s[40:41], s[6:7]
	; GCN-NEXT: s_mov_b64 s[42:43], s[4:5]			; GCN-NEXT: s_mov_b64 s[42:43], s[4:5]
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v41			; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v40
	; GCN-NEXT: v_mov_b32_e32 v43, 0			; GCN-NEXT: v_mov_b32_e32 v42, 0
	; GCN-NEXT: flat_load_dword v44, v[0:1]			; GCN-NEXT: flat_load_dword v43, v[0:1]
	; GCN-NEXT: v_mov_b32_e32 v45, 0x7fc00000			; GCN-NEXT: v_mov_b32_e32 v44, 0x7fc00000
	; GCN-NEXT: s_getpc_b64 s[48:49]			; GCN-NEXT: s_getpc_b64 s[48:49]
	; GCN-NEXT: s_add_u32 s48, s48, spam@rel32@lo+4			; GCN-NEXT: s_add_u32 s48, s48, spam@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s49, s49, spam@rel32@hi+12			; GCN-NEXT: s_addc_u32 s49, s49, spam@rel32@hi+12
	; GCN-NEXT: v_lshlrev_b32_e32 v42, 2, v2			; GCN-NEXT: v_lshlrev_b32_e32 v41, 2, v2
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_cmp_eq_f32_e64 s[34:35], 0, v44			; GCN-NEXT: v_cmp_eq_f32_e64 s[34:35], 0, v43
	; GCN-NEXT: s_branch .LBB1_3			; GCN-NEXT: s_branch .LBB1_3
	; GCN-NEXT: .LBB1_1: ; %bb10			; GCN-NEXT: .LBB1_1: ; %bb10
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[6:7]			; GCN-NEXT: s_or_b64 exec, exec, s[6:7]
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0
	; GCN-NEXT: .LBB1_2: ; %bb18			; GCN-NEXT: .LBB1_2: ; %bb18
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: .LBB1_3: ; %bb2			; GCN-NEXT: .LBB1_3: ; %bb2
	; GCN-NEXT: ; =>This Loop Header: Depth=1			; GCN-NEXT: ; =>This Loop Header: Depth=1
	; GCN-NEXT: ; Child Loop BB1_4 Depth 2			; GCN-NEXT: ; Child Loop BB1_4 Depth 2
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: .LBB1_4: ; %bb2			; GCN-NEXT: .LBB1_4: ; %bb2
	; GCN-NEXT: ; Parent Loop BB1_3 Depth=1			; GCN-NEXT: ; Parent Loop BB1_3 Depth=1
	; GCN-NEXT: ; => This Inner Loop Header: Depth=2			; GCN-NEXT: ; => This Inner Loop Header: Depth=2
	; GCN-NEXT: flat_load_dword v0, v[42:43]			; GCN-NEXT: flat_load_dword v0, v[41:42]
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], 0
	; GCN-NEXT: s_waitcnt vmcnt(1)			; GCN-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 3, v0			; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 3, v0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execz .LBB1_6			; GCN-NEXT: s_cbranch_execz .LBB1_6
	; GCN-NEXT: ; %bb.5: ; %bb8			; GCN-NEXT: ; %bb.5: ; %bb8
	; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2			; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
	; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]			; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]
	Show All 14 Lines
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: s_mov_b64 s[4:5], s[42:43]			; GCN-NEXT: s_mov_b64 s[4:5], s[42:43]
	; GCN-NEXT: s_mov_b64 s[6:7], s[40:41]			; GCN-NEXT: s_mov_b64 s[6:7], s[40:41]
	; GCN-NEXT: s_mov_b64 s[8:9], s[38:39]			; GCN-NEXT: s_mov_b64 s[8:9], s[38:39]
	; GCN-NEXT: s_mov_b64 s[10:11], s[36:37]			; GCN-NEXT: s_mov_b64 s[10:11], s[36:37]
	; GCN-NEXT: s_mov_b32 s12, s46			; GCN-NEXT: s_mov_b32 s12, s46
	; GCN-NEXT: s_mov_b32 s13, s45			; GCN-NEXT: s_mov_b32 s13, s45
	; GCN-NEXT: s_mov_b32 s14, s44			; GCN-NEXT: s_mov_b32 s14, s44
	; GCN-NEXT: v_mov_b32_e32 v31, v41			; GCN-NEXT: v_mov_b32_e32 v31, v40
	; GCN-NEXT: s_swappc_b64 s[30:31], s[48:49]			; GCN-NEXT: s_swappc_b64 s[30:31], s[48:49]
	; GCN-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0			; GCN-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execnz .LBB1_4			; GCN-NEXT: s_cbranch_execnz .LBB1_4
	; GCN-NEXT: ; %bb.8: ; %bb14			; GCN-NEXT: ; %bb.8: ; %bb14
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[8:9]			; GCN-NEXT: s_or_b64 exec, exec, s[8:9]
	; GCN-NEXT: s_and_saveexec_b64 s[4:5], s[34:35]			; GCN-NEXT: s_and_saveexec_b64 s[4:5], s[34:35]
	; GCN-NEXT: s_cbranch_execnz .LBB1_10			; GCN-NEXT: s_cbranch_execnz .LBB1_10
	; GCN-NEXT: ; %bb.9: ; %bb16			; GCN-NEXT: ; %bb.9: ; %bb16
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0
	; GCN-NEXT: .LBB1_10: ; %bb17			; GCN-NEXT: .LBB1_10: ; %bb17
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], 0
	; GCN-NEXT: s_branch .LBB1_2			; GCN-NEXT: s_branch .LBB1_2
	bb:			bb:
	%tmp = load float, float* null, align 16			%tmp = load float, float* null, align 16
	br label %bb2			br label %bb2

	bb1: ; preds = %bb8, %bb6			bb1: ; preds = %bb8, %bb6
	br label %bb2			br label %bb2

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	declare void @extern_func() #2			declare void @extern_func() #2

	define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be			; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be
	; preserved across the call and should get 8 scratch registers.			; preserved across the call and should get 8 scratch registers.
	; GFX9-LABEL: non_preserved_vgpr_tuple8:			; GFX9-LABEL: non_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: v_writelane_b32 v45, s33, 0			; GFX9-NEXT: v_writelane_b32 v45, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v36, v16			; GFX9-NEXT: v_mov_b32_e32 v36, v16
	; GFX9-NEXT: v_mov_b32_e32 v35, v15			; GFX9-NEXT: v_mov_b32_e32 v35, v15
	; GFX9-NEXT: v_mov_b32_e32 v34, v14			; GFX9-NEXT: v_mov_b32_e32 v34, v14
	; GFX9-NEXT: v_mov_b32_e32 v33, v13			; GFX9-NEXT: v_mov_b32_e32 v33, v13
	; GFX9-NEXT: v_mov_b32_e32 v32, v12			; GFX9-NEXT: v_mov_b32_e32 v32, v12
	; GFX9-NEXT: s_mov_b32 s5, s4			; GFX9-NEXT: s_mov_b32 s5, s4
	; GFX9-NEXT: s_mov_b32 s6, s4			; GFX9-NEXT: s_mov_b32 s6, s4
	; GFX9-NEXT: s_mov_b32 s7, s4			; GFX9-NEXT: s_mov_b32 s7, s4
	; GFX9-NEXT: s_mov_b32 s8, s4			; GFX9-NEXT: s_mov_b32 s8, s4
	; GFX9-NEXT: s_mov_b32 s9, s4			; GFX9-NEXT: s_mov_b32 s9, s4
	; GFX9-NEXT: s_mov_b32 s10, s4			; GFX9-NEXT: s_mov_b32 s10, s4
	; GFX9-NEXT: s_mov_b32 s11, s4			; GFX9-NEXT: s_mov_b32 s11, s4
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[40:43], v[32:36], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr44
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v44, s30, 0
				; GFX9-NEXT: v_writelane_b32 v44, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_mov_b32_e32 v0, v41			; GFX9-NEXT: v_mov_b32_e32 v0, v40
	; GFX9-NEXT: v_mov_b32_e32 v1, v42			; GFX9-NEXT: v_mov_b32_e32 v1, v41
	; GFX9-NEXT: v_mov_b32_e32 v2, v43			; GFX9-NEXT: v_mov_b32_e32 v2, v42
	; GFX9-NEXT: v_mov_b32_e32 v3, v44			; GFX9-NEXT: v_mov_b32_e32 v3, v43
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v44, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v44, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v45, 0			; GFX9-NEXT: v_readlane_b32 s33, v45, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: non_preserved_vgpr_tuple8:			; GFX10-LABEL: non_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v36, v16			; GFX10-NEXT: v_mov_b32_e32 v36, v16
	; GFX10-NEXT: v_mov_b32_e32 v35, v15			; GFX10-NEXT: v_mov_b32_e32 v35, v15
	; GFX10-NEXT: v_mov_b32_e32 v34, v14			; GFX10-NEXT: v_mov_b32_e32 v34, v14
	; GFX10-NEXT: v_mov_b32_e32 v33, v13			; GFX10-NEXT: v_mov_b32_e32 v33, v13
	; GFX10-NEXT: v_mov_b32_e32 v32, v12			; GFX10-NEXT: v_mov_b32_e32 v32, v12
	; GFX10-NEXT: s_mov_b32 s4, 0			; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v45, s33, 0			; GFX10-NEXT: v_writelane_b32 v45, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_mov_b32 s5, s4			; GFX10-NEXT: s_mov_b32 s5, s4
	; GFX10-NEXT: s_mov_b32 s6, s4			; GFX10-NEXT: s_mov_b32 s6, s4
	; GFX10-NEXT: s_mov_b32 s7, s4			; GFX10-NEXT: s_mov_b32 s7, s4
	; GFX10-NEXT: s_mov_b32 s8, s4			; GFX10-NEXT: s_mov_b32 s8, s4
	; GFX10-NEXT: s_mov_b32 s9, s4			; GFX10-NEXT: s_mov_b32 s9, s4
	; GFX10-NEXT: s_mov_b32 s10, s4			; GFX10-NEXT: s_mov_b32 s10, s4
	; GFX10-NEXT: s_mov_b32 s11, s4			; GFX10-NEXT: s_mov_b32 s11, s4
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[40:43], v[32:36], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr44
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v44, s30, 0
				; GFX10-NEXT: v_writelane_b32 v44, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_mov_b32_e32 v0, v41			; GFX10-NEXT: v_mov_b32_e32 v0, v40
	; GFX10-NEXT: v_mov_b32_e32 v1, v42			; GFX10-NEXT: v_mov_b32_e32 v1, v41
	; GFX10-NEXT: v_mov_b32_e32 v2, v43			; GFX10-NEXT: v_mov_b32_e32 v2, v42
	; GFX10-NEXT: v_mov_b32_e32 v3, v44			; GFX10-NEXT: v_mov_b32_e32 v3, v43
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v44, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v44, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v45, 0			; GFX10-NEXT: v_readlane_b32 s33, v45, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:16
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20			; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]



	Show All 18 Lines
	; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs to be preserved			; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs to be preserved
	; across the call and should get allcoated to 8 CSRs.			; across the call and should get allcoated to 8 CSRs.
	; Only the lower 5 sub-registers of the tuple are preserved.			; Only the lower 5 sub-registers of the tuple are preserved.
	; The upper 3 sub-registers are unused.			; The upper 3 sub-registers are unused.
	; GFX9-LABEL: call_preserved_vgpr_tuple8:			; GFX9-LABEL: call_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr45
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_writelane_b32 v40, s36, 2
	; GFX9-NEXT: v_writelane_b32 v40, s37, 3
	; GFX9-NEXT: v_writelane_b32 v40, s38, 4
	; GFX9-NEXT: v_writelane_b32 v40, s39, 5
	; GFX9-NEXT: v_writelane_b32 v40, s40, 6
	; GFX9-NEXT: v_writelane_b32 v40, s41, 7
	; GFX9-NEXT: v_writelane_b32 v46, s33, 0			; GFX9-NEXT: v_writelane_b32 v46, s33, 0
				; GFX9-NEXT: v_writelane_b32 v45, s30, 0
				; GFX9-NEXT: v_writelane_b32 v45, s31, 1
				; GFX9-NEXT: v_writelane_b32 v45, s36, 2
				; GFX9-NEXT: v_writelane_b32 v45, s37, 3
				; GFX9-NEXT: v_writelane_b32 v45, s38, 4
				; GFX9-NEXT: v_writelane_b32 v45, s39, 5
				; GFX9-NEXT: v_writelane_b32 v45, s40, 6
				; GFX9-NEXT: v_writelane_b32 v45, s41, 7
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_writelane_b32 v40, s42, 8			; GFX9-NEXT: v_writelane_b32 v45, s42, 8
	; GFX9-NEXT: s_mov_b32 s36, 0			; GFX9-NEXT: s_mov_b32 s36, 0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s43, 9			; GFX9-NEXT: v_writelane_b32 v45, s43, 9
	; GFX9-NEXT: v_mov_b32_e32 v45, v16			; GFX9-NEXT: v_mov_b32_e32 v44, v16
	; GFX9-NEXT: v_mov_b32_e32 v44, v15			; GFX9-NEXT: v_mov_b32_e32 v43, v15
	; GFX9-NEXT: v_mov_b32_e32 v43, v14			; GFX9-NEXT: v_mov_b32_e32 v42, v14
	; GFX9-NEXT: v_mov_b32_e32 v42, v13			; GFX9-NEXT: v_mov_b32_e32 v41, v13
	; GFX9-NEXT: v_mov_b32_e32 v41, v12			; GFX9-NEXT: v_mov_b32_e32 v40, v12
	; GFX9-NEXT: s_mov_b32 s37, s36			; GFX9-NEXT: s_mov_b32 s37, s36
	; GFX9-NEXT: s_mov_b32 s38, s36			; GFX9-NEXT: s_mov_b32 s38, s36
	; GFX9-NEXT: s_mov_b32 s39, s36			; GFX9-NEXT: s_mov_b32 s39, s36
	; GFX9-NEXT: s_mov_b32 s40, s36			; GFX9-NEXT: s_mov_b32 s40, s36
	; GFX9-NEXT: s_mov_b32 s41, s36			; GFX9-NEXT: s_mov_b32 s41, s36
	; GFX9-NEXT: s_mov_b32 s42, s36			; GFX9-NEXT: s_mov_b32 s42, s36
	; GFX9-NEXT: s_mov_b32 s43, s36			; GFX9-NEXT: s_mov_b32 s43, s36
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[36:43], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[40:44], s[36:43], s[4:7] dmask:0x1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[36:43], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[40:44], s[36:43], s[4:7] dmask:0x1
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s43, v40, 9			; GFX9-NEXT: v_readlane_b32 s43, v45, 9
	; GFX9-NEXT: v_readlane_b32 s42, v40, 8			; GFX9-NEXT: v_readlane_b32 s42, v45, 8
	; GFX9-NEXT: v_readlane_b32 s41, v40, 7			; GFX9-NEXT: v_readlane_b32 s41, v45, 7
	; GFX9-NEXT: v_readlane_b32 s40, v40, 6			; GFX9-NEXT: v_readlane_b32 s40, v45, 6
	; GFX9-NEXT: v_readlane_b32 s39, v40, 5			; GFX9-NEXT: v_readlane_b32 s39, v45, 5
	; GFX9-NEXT: v_readlane_b32 s38, v40, 4			; GFX9-NEXT: v_readlane_b32 s38, v45, 4
	; GFX9-NEXT: v_readlane_b32 s37, v40, 3			; GFX9-NEXT: v_readlane_b32 s37, v45, 3
	; GFX9-NEXT: v_readlane_b32 s36, v40, 2			; GFX9-NEXT: v_readlane_b32 s36, v45, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v45, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v45, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v46, 0			; GFX9-NEXT: v_readlane_b32 s33, v46, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_preserved_vgpr_tuple8:			; GFX10-LABEL: call_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr45
	; GFX10-NEXT: v_writelane_b32 v46, s33, 0			; GFX10-NEXT: v_writelane_b32 v46, s33, 0
				; GFX10-NEXT: v_writelane_b32 v45, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_mov_b32_e32 v41, v16			; GFX10-NEXT: v_writelane_b32 v45, s31, 1
	; GFX10-NEXT: v_mov_b32_e32 v42, v15			; GFX10-NEXT: v_mov_b32_e32 v40, v16
	; GFX10-NEXT: v_mov_b32_e32 v43, v14			; GFX10-NEXT: v_mov_b32_e32 v41, v15
	; GFX10-NEXT: v_writelane_b32 v40, s36, 2			; GFX10-NEXT: v_mov_b32_e32 v42, v14
				; GFX10-NEXT: v_mov_b32_e32 v43, v13
				; GFX10-NEXT: v_writelane_b32 v45, s36, 2
	; GFX10-NEXT: s_mov_b32 s36, 0			; GFX10-NEXT: s_mov_b32 s36, 0
	; GFX10-NEXT: v_mov_b32_e32 v44, v13			; GFX10-NEXT: v_mov_b32_e32 v44, v12
	; GFX10-NEXT: v_mov_b32_e32 v45, v12			; GFX10-NEXT: v_writelane_b32 v45, s37, 3
	; GFX10-NEXT: v_writelane_b32 v40, s37, 3
	; GFX10-NEXT: s_mov_b32 s37, s36			; GFX10-NEXT: s_mov_b32 s37, s36
	; GFX10-NEXT: v_writelane_b32 v40, s38, 4			; GFX10-NEXT: v_writelane_b32 v45, s38, 4
	; GFX10-NEXT: s_mov_b32 s38, s36			; GFX10-NEXT: s_mov_b32 s38, s36
	; GFX10-NEXT: v_writelane_b32 v40, s39, 5			; GFX10-NEXT: v_writelane_b32 v45, s39, 5
	; GFX10-NEXT: s_mov_b32 s39, s36			; GFX10-NEXT: s_mov_b32 s39, s36
	; GFX10-NEXT: v_writelane_b32 v40, s40, 6			; GFX10-NEXT: v_writelane_b32 v45, s40, 6
	; GFX10-NEXT: s_mov_b32 s40, s36			; GFX10-NEXT: s_mov_b32 s40, s36
	; GFX10-NEXT: v_writelane_b32 v40, s41, 7			; GFX10-NEXT: v_writelane_b32 v45, s41, 7
	; GFX10-NEXT: s_mov_b32 s41, s36			; GFX10-NEXT: s_mov_b32 s41, s36
	; GFX10-NEXT: v_writelane_b32 v40, s42, 8			; GFX10-NEXT: v_writelane_b32 v45, s42, 8
	; GFX10-NEXT: s_mov_b32 s42, s36			; GFX10-NEXT: s_mov_b32 s42, s36
	; GFX10-NEXT: v_writelane_b32 v40, s43, 9			; GFX10-NEXT: v_writelane_b32 v45, s43, 9
	; GFX10-NEXT: s_mov_b32 s43, s36			; GFX10-NEXT: s_mov_b32 s43, s36
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v44, v43, v42, v41, v40], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_clause 0x4			; GFX10-NEXT: s_clause 0x4
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16
	; GFX10-NEXT: v_readlane_b32 s43, v40, 9			; GFX10-NEXT: v_readlane_b32 s43, v45, 9
	; GFX10-NEXT: v_readlane_b32 s42, v40, 8			; GFX10-NEXT: v_readlane_b32 s42, v45, 8
	; GFX10-NEXT: v_readlane_b32 s41, v40, 7			; GFX10-NEXT: v_readlane_b32 s41, v45, 7
	; GFX10-NEXT: v_readlane_b32 s40, v40, 6			; GFX10-NEXT: v_readlane_b32 s40, v45, 6
	; GFX10-NEXT: v_readlane_b32 s39, v40, 5			; GFX10-NEXT: v_readlane_b32 s39, v45, 5
	; GFX10-NEXT: v_readlane_b32 s38, v40, 4			; GFX10-NEXT: v_readlane_b32 s38, v45, 4
	; GFX10-NEXT: v_readlane_b32 s37, v40, 3			; GFX10-NEXT: v_readlane_b32 s37, v45, 3
	; GFX10-NEXT: v_readlane_b32 s36, v40, 2			; GFX10-NEXT: v_readlane_b32 s36, v45, 2
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v45, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v45, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v46, 0			; GFX10-NEXT: v_readlane_b32 s33, v46, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20			; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20
	; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24			; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]



	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
ret void		ret void
}		}

define amdgpu_gfx void @strict_wwm_cfg(<4 x i32> inreg %tmp14, i32 %arg) {		define amdgpu_gfx void @strict_wwm_cfg(<4 x i32> inreg %tmp14, i32 %arg) {
; GFX9-O0-LABEL: strict_wwm_cfg:		; GFX9-O0-LABEL: strict_wwm_cfg:
; GFX9-O0: ; %bb.0: ; %entry		; GFX9-O0: ; %bb.0: ; %entry
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_mov_b32 s36, s4		; GFX9-O0-NEXT: s_mov_b32 s36, s4
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s5		; GFX9-O0-NEXT: s_mov_b32 s37, s5
; GFX9-O0-NEXT: s_mov_b32 s38, s6		; GFX9-O0-NEXT: s_mov_b32 s38, s6
; GFX9-O0-NEXT: s_mov_b32 s39, s7		; GFX9-O0-NEXT: s_mov_b32 s39, s7
; GFX9-O0-NEXT: s_mov_b64 s[42:43], s[38:39]		; GFX9-O0-NEXT: s_mov_b64 s[42:43], s[38:39]
; GFX9-O0-NEXT: s_mov_b64 s[40:41], s[36:37]		; GFX9-O0-NEXT: s_mov_b64 s[40:41], s[36:37]
; GFX9-O0-NEXT: v_writelane_b32 v5, s40, 0		; GFX9-O0-NEXT: ; implicit-def: $vgpr3
; GFX9-O0-NEXT: v_writelane_b32 v5, s41, 1		; GFX9-O0-NEXT: v_writelane_b32 v3, s40, 0
; GFX9-O0-NEXT: v_writelane_b32 v5, s42, 2		; GFX9-O0-NEXT: v_writelane_b32 v3, s41, 1
; GFX9-O0-NEXT: v_writelane_b32 v5, s43, 3		; GFX9-O0-NEXT: v_writelane_b32 v3, s42, 2
		; GFX9-O0-NEXT: v_writelane_b32 v3, s43, 3
		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_load_dwordx2 v[3:4], off, s[36:39], s34		; GFX9-O0-NEXT: buffer_load_dwordx2 v[3:4], off, s[36:39], s34
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37		; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s34
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34
; GFX9-O0-NEXT: s_nop 1		; GFX9-O0-NEXT: s_nop 1
; GFX9-O0-NEXT: v_mov_b32_dpp v2, v1 row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9-O0-NEXT: v_mov_b32_dpp v2, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2		; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2
; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v0, s34		; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v0, s34
; GFX9-O0-NEXT: v_mov_b32_e32 v0, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v0, s34
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 s[34:35], exec		; GFX9-O0-NEXT: s_mov_b64 s[34:35], exec
; GFX9-O0-NEXT: v_writelane_b32 v5, s34, 4		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_writelane_b32 v5, s35, 5		; GFX9-O0-NEXT: v_writelane_b32 v0, s34, 4
		; GFX9-O0-NEXT: v_writelane_b32 v0, s35, 5
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_and_b64 s[34:35], s[34:35], s[36:37]		; GFX9-O0-NEXT: s_and_b64 s[34:35], s[34:35], s[36:37]
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_cbranch_execz .LBB1_2		; GFX9-O0-NEXT: s_cbranch_execz .LBB1_2
; GFX9-O0-NEXT: ; %bb.1: ; %if		; GFX9-O0-NEXT: ; %bb.1: ; %if
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v4		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v4
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_mov_b32_e32 v1, 0		; GFX9-O0-NEXT: v_mov_b32_e32 v1, 0
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v1
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_mov_b32_dpp v1, v2 row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9-O0-NEXT: v_mov_b32_dpp v1, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9-O0-NEXT: v_add_u32_e64 v1, v2, v1		; GFX9-O0-NEXT: v_add_u32_e64 v1, v2, v1
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-O0-NEXT: .LBB1_2: ; %merge		; GFX9-O0-NEXT: .LBB1_2: ; %merge
; GFX9-O0-NEXT: v_readlane_b32 s34, v5, 4		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s35, v5, 5		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
		; GFX9-O0-NEXT: v_readlane_b32 s34, v0, 4
		; GFX9-O0-NEXT: v_readlane_b32 s35, v0, 5
; GFX9-O0-NEXT: s_or_b64 exec, exec, s[34:35]		; GFX9-O0-NEXT: s_or_b64 exec, exec, s[34:35]
; GFX9-O0-NEXT: v_readlane_b32 s36, v5, 0		; GFX9-O0-NEXT: v_readlane_b32 s36, v0, 0
; GFX9-O0-NEXT: v_readlane_b32 s37, v5, 1		; GFX9-O0-NEXT: v_readlane_b32 s37, v0, 1
; GFX9-O0-NEXT: v_readlane_b32 s38, v5, 2		; GFX9-O0-NEXT: v_readlane_b32 s38, v0, 2
; GFX9-O0-NEXT: v_readlane_b32 s39, v5, 3		; GFX9-O0-NEXT: v_readlane_b32 s39, v0, 3
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[34:35], v0, v3		; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[34:35], v0, v3
; GFX9-O0-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[34:35]		; GFX9-O0-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[34:35]
; GFX9-O0-NEXT: s_mov_b32 s34, 1		; GFX9-O0-NEXT: s_mov_b32 s34, 1
; GFX9-O0-NEXT: v_lshlrev_b32_e64 v0, s34, v0		; GFX9-O0-NEXT: v_lshlrev_b32_e64 v0, s34, v0
; GFX9-O0-NEXT: s_mov_b32 s34, 2		; GFX9-O0-NEXT: s_mov_b32 s34, 2
; GFX9-O0-NEXT: v_and_b32_e64 v0, v0, s34		; GFX9-O0-NEXT: v_and_b32_e64 v0, v0, s34
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_cfg:		; GFX9-O3-LABEL: strict_wwm_cfg:
; GFX9-O3: ; %bb.0: ; %entry		; GFX9-O3: ; %bb.0: ; %entry
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
ret i32 %sub		ret i32 %sub
}		}

define amdgpu_gfx void @strict_wwm_call(<4 x i32> inreg %tmp14, i32 inreg %arg) {		define amdgpu_gfx void @strict_wwm_call(<4 x i32> inreg %tmp14, i32 inreg %arg) {
; GFX9-O0-LABEL: strict_wwm_call:		; GFX9-O0-LABEL: strict_wwm_call:
; GFX9-O0: ; %bb.0:		; GFX9-O0: ; %bb.0:
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_mov_b32 s35, s33		; GFX9-O0-NEXT: s_mov_b32 s35, s33
; GFX9-O0-NEXT: s_mov_b32 s33, s32		; GFX9-O0-NEXT: s_mov_b32 s33, s32
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x400		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x800
; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0		; GFX9-O0-NEXT: ; implicit-def: $vgpr0
; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1		; GFX9-O0-NEXT: v_writelane_b32 v0, s30, 0
		; GFX9-O0-NEXT: v_writelane_b32 v0, s31, 1
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b32 s36, s4		; GFX9-O0-NEXT: s_mov_b32 s36, s4
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s5		; GFX9-O0-NEXT: s_mov_b32 s37, s5
; GFX9-O0-NEXT: s_mov_b32 s38, s6		; GFX9-O0-NEXT: s_mov_b32 s38, s6
; GFX9-O0-NEXT: s_mov_b32 s39, s7		; GFX9-O0-NEXT: s_mov_b32 s39, s7
; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41_sgpr42_sgpr43 killed $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41_sgpr42_sgpr43 killed $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: v_mov_b32_e32 v2, s8		; GFX9-O0-NEXT: v_mov_b32_e32 v2, s8
Show All 10 Lines
; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[46:47]		; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[46:47]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[42:43]		; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[42:43]
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0
; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2		; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2
; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]		; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4
; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00		; GFX9-O0-NEXT: v_readlane_b32 s31, v0, 1
		; GFX9-O0-NEXT: v_readlane_b32 s30, v0, 0
		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff800
; GFX9-O0-NEXT: s_mov_b32 s33, s35		; GFX9-O0-NEXT: s_mov_b32 s33, s35
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_call:		; GFX9-O3-LABEL: strict_wwm_call:
; GFX9-O3: ; %bb.0:		; GFX9-O3: ; %bb.0:
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0		; GFX9-O3-NEXT: ; implicit-def: $vgpr3
; GFX9-O3-NEXT: s_mov_b32 s38, s33		; GFX9-O3-NEXT: s_mov_b32 s38, s33
		; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0
; GFX9-O3-NEXT: s_mov_b32 s33, s32		; GFX9-O3-NEXT: s_mov_b32 s33, s32
; GFX9-O3-NEXT: s_addk_i32 s32, 0x400		; GFX9-O3-NEXT: s_addk_i32 s32, 0x400
; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1		; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1
; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8		; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8
; GFX9-O3-NEXT: s_not_b64 exec, exec		; GFX9-O3-NEXT: s_not_b64 exec, exec
; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0		; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0
; GFX9-O3-NEXT: s_not_b64 exec, exec		; GFX9-O3-NEXT: s_not_b64 exec, exec
; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
ret i64 %sub		ret i64 %sub
}		}

define amdgpu_gfx void @strict_wwm_call_i64(<4 x i32> inreg %tmp14, i64 inreg %arg) {		define amdgpu_gfx void @strict_wwm_call_i64(<4 x i32> inreg %tmp14, i64 inreg %arg) {
; GFX9-O0-LABEL: strict_wwm_call_i64:		; GFX9-O0-LABEL: strict_wwm_call_i64:
; GFX9-O0: ; %bb.0:		; GFX9-O0: ; %bb.0:
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_store_dword v10, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_mov_b32 s42, s33		; GFX9-O0-NEXT: s_mov_b32 s42, s33
; GFX9-O0-NEXT: s_mov_b32 s33, s32		; GFX9-O0-NEXT: s_mov_b32 s33, s32
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xc00		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x1000
; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0		; GFX9-O0-NEXT: ; implicit-def: $vgpr0
; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1		; GFX9-O0-NEXT: v_writelane_b32 v0, s30, 0
		; GFX9-O0-NEXT: v_writelane_b32 v0, s31, 1
; GFX9-O0-NEXT: s_mov_b32 s34, s8		; GFX9-O0-NEXT: s_mov_b32 s34, s8
; GFX9-O0-NEXT: s_mov_b32 s36, s4		; GFX9-O0-NEXT: s_mov_b32 s36, s4
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s5		; GFX9-O0-NEXT: s_mov_b32 s37, s5
; GFX9-O0-NEXT: s_mov_b32 s38, s6		; GFX9-O0-NEXT: s_mov_b32 s38, s6
; GFX9-O0-NEXT: s_mov_b32 s39, s7		; GFX9-O0-NEXT: s_mov_b32 s39, s7
; GFX9-O0-NEXT: v_writelane_b32 v10, s36, 2		; GFX9-O0-NEXT: v_writelane_b32 v0, s36, 2
; GFX9-O0-NEXT: v_writelane_b32 v10, s37, 3		; GFX9-O0-NEXT: v_writelane_b32 v0, s37, 3
; GFX9-O0-NEXT: v_writelane_b32 v10, s38, 4		; GFX9-O0-NEXT: v_writelane_b32 v0, s38, 4
; GFX9-O0-NEXT: v_writelane_b32 v10, s39, 5		; GFX9-O0-NEXT: v_writelane_b32 v0, s39, 5
; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35		; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35
; GFX9-O0-NEXT: s_mov_b32 s35, s9		; GFX9-O0-NEXT: s_mov_b32 s35, s9
; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41 killed $sgpr34_sgpr35		; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41 killed $sgpr34_sgpr35
; GFX9-O0-NEXT: s_mov_b64 s[36:37], 0		; GFX9-O0-NEXT: s_mov_b64 s[36:37], 0
; GFX9-O0-NEXT: v_mov_b32_e32 v8, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v8, s34
; GFX9-O0-NEXT: v_mov_b32_e32 v9, s35		; GFX9-O0-NEXT: v_mov_b32_e32 v9, s35
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v8, s36		; GFX9-O0-NEXT: v_mov_b32_e32 v8, s36
; GFX9-O0-NEXT: v_mov_b32_e32 v9, s37		; GFX9-O0-NEXT: v_mov_b32_e32 v9, s37
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_writelane_b32 v10, s34, 6		; GFX9-O0-NEXT: v_writelane_b32 v0, s34, 6
; GFX9-O0-NEXT: v_writelane_b32 v10, s35, 7		; GFX9-O0-NEXT: v_writelane_b32 v0, s35, 7
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v8		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v8
; GFX9-O0-NEXT: s_mov_b32 s34, 32		; GFX9-O0-NEXT: s_mov_b32 s34, 32
; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37		; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37
; GFX9-O0-NEXT: v_lshrrev_b64 v[3:4], s34, v[8:9]		; GFX9-O0-NEXT: v_lshrrev_b64 v[3:4], s34, v[8:9]
; GFX9-O0-NEXT: s_getpc_b64 s[34:35]		; GFX9-O0-NEXT: s_getpc_b64 s[34:35]
; GFX9-O0-NEXT: s_add_u32 s34, s34, strict_wwm_called_i64@gotpcrel32@lo+4		; GFX9-O0-NEXT: s_add_u32 s34, s34, strict_wwm_called_i64@gotpcrel32@lo+4
; GFX9-O0-NEXT: s_addc_u32 s35, s35, strict_wwm_called_i64@gotpcrel32@hi+12		; GFX9-O0-NEXT: s_addc_u32 s35, s35, strict_wwm_called_i64@gotpcrel32@hi+12
; GFX9-O0-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-O0-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-O0-NEXT: s_mov_b64 s[38:39], s[2:3]		; GFX9-O0-NEXT: s_mov_b64 s[38:39], s[2:3]
; GFX9-O0-NEXT: s_mov_b64 s[36:37], s[0:1]		; GFX9-O0-NEXT: s_mov_b64 s[36:37], s[0:1]
; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[36:37]		; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[36:37]
; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[38:39]		; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[38:39]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
; GFX9-O0-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-O0-NEXT: v_readlane_b32 s34, v10, 6		; GFX9-O0-NEXT: buffer_load_dword v6, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s35, v10, 7		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_readlane_b32 s36, v10, 2		; GFX9-O0-NEXT: v_readlane_b32 s34, v6, 6
; GFX9-O0-NEXT: v_readlane_b32 s37, v10, 3		; GFX9-O0-NEXT: v_readlane_b32 s35, v6, 7
; GFX9-O0-NEXT: v_readlane_b32 s38, v10, 4		; GFX9-O0-NEXT: v_readlane_b32 s36, v6, 2
; GFX9-O0-NEXT: v_readlane_b32 s39, v10, 5		; GFX9-O0-NEXT: v_readlane_b32 s37, v6, 3
		; GFX9-O0-NEXT: v_readlane_b32 s38, v6, 4
		; GFX9-O0-NEXT: v_readlane_b32 s39, v6, 5
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0
; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1
; GFX9-O0-NEXT: v_mov_b32_e32 v4, v8		; GFX9-O0-NEXT: v_mov_b32_e32 v4, v8
; GFX9-O0-NEXT: v_mov_b32_e32 v5, v9		; GFX9-O0-NEXT: v_mov_b32_e32 v5, v9
; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[40:41], v2, v4		; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[40:41], v2, v4
; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[40:41], v3, v5, s[40:41]		; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[40:41], v3, v5, s[40:41]
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4
; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff400		; GFX9-O0-NEXT: v_readlane_b32 s31, v0, 1
		; GFX9-O0-NEXT: v_readlane_b32 s30, v0, 0
		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff000
; GFX9-O0-NEXT: s_mov_b32 s33, s42		; GFX9-O0-NEXT: s_mov_b32 s33, s42
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_call_i64:		; GFX9-O3-LABEL: strict_wwm_call_i64:
; GFX9-O3: ; %bb.0:		; GFX9-O3: ; %bb.0:
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O3-NEXT: buffer_store_dword v8, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v8, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_waitcnt vmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_waitcnt vmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0		; GFX9-O3-NEXT: ; implicit-def: $vgpr8
; GFX9-O3-NEXT: s_mov_b32 s40, s33		; GFX9-O3-NEXT: s_mov_b32 s40, s33
		; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0
; GFX9-O3-NEXT: s_mov_b32 s33, s32		; GFX9-O3-NEXT: s_mov_b32 s33, s32
; GFX9-O3-NEXT: s_addk_i32 s32, 0x800		; GFX9-O3-NEXT: s_addk_i32 s32, 0x800
; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1		; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1
; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O3-NEXT: s_getpc_b64 s[36:37]		; GFX9-O3-NEXT: s_getpc_b64 s[36:37]
; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4		; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4
; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12		; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12
; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0		; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0
▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRsAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 424263

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/csr-sgpr-spill-live-ins.mir

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll

llvm/test/CodeGen/AMDGPU/fold-reload-into-exec.mir

llvm/test/CodeGen/AMDGPU/fold-reload-into-m0.mir

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-dead-frame-in-dbg-value.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-partially-undef.mir

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

llvm/test/CodeGen/AMDGPU/spill-reg-tuple-super-reg-use.mir

llvm/test/CodeGen/AMDGPU/spill-sgpr-csr-live-ins.mir

llvm/test/CodeGen/AMDGPU/spill-sgpr-stack-no-sgpr.ll

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir

llvm/test/CodeGen/AMDGPU/spill-writelane-vgprs.ll

llvm/test/CodeGen/AMDGPU/spill192.mir

llvm/test/CodeGen/AMDGPU/spill224.mir

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
AcceptedPublic