This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
3/6
SIFrameLowering.cpp
-
SILowerSGPRSpills.cpp
1/5
SIMachineFunctionInfo.h
4
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
2/4
assert-align.ll
-
call-outgoing-stack-args.ll
-
localizer.ll
-
abi-attribute-hints-undefined-behavior.ll
-
amdpal-callable.ll
-
bf16.ll
-
call-graph-register-usage.ll
-
call-preserved-registers.ll
1/2
callee-frame-setup.ll
-
cross-block-use-is-not-abi-copy.ll
-
dwarf-multi-register-use-crash.ll
-
frame-setup-without-sgpr-to-vgpr-spills.ll
-
gfx-call-non-gfx-func.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
gfx-callable-return-types.ll
-
indirect-call.ll
-
mul24-pass-ordering.ll
-
need-fp-from-vgpr-spills.ll
-
nested-calls.ll
-
no-source-locations-in-prologue.ll
-
pei-scavenge-vgpr-spill.mir
-
save-fp.ll
-
sgpr-spills-split-regalloc.ll
-
sibling-call.ll
-
spill-csr-frame-ptr-reg-copy.ll
-
stack-realign.ll
-
tail-call-amdgpu-gfx.ll
-
tuple-allocation-failure.ll
-
unstructured-cfg-def-use-issue.ll
-
vgpr-tuple-allocation.ll
-
wave32.ll
-
wwm-reserved-spill.ll

Differential D124195

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
ClosedPublic

Authored by cdevadas on Apr 21 2022, 12:16 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
sebastian-ne
nhaehnle

Commits

rGb25b4c0ab4ad: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI

Summary

SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cdevadas created this revision.Apr 21 2022, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:16 PM

Herald added subscribers: hsmhsm, foad, kerbowa and 10 others. · View Herald Transcript

cdevadas requested review of this revision.Apr 21 2022, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:16 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B160699: Diff 424262.Apr 21 2022, 12:17 PM

cdevadas added a parent revision: D124194: [AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog.Apr 21 2022, 12:18 PM

cdevadas added a child revision: D124196: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

foad added inline comments.Apr 22 2022, 3:05 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	Seems like a regression. Does this get fixed by a later patch?

cdevadas added inline comments.Apr 25 2022, 4:20 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	Yes, it is. With spilling SGPRs into virtual VPGR lanes, it won't directly be possible to track the unused lanes of the physical VGPR allocated for the last virtual register created during `SILowerSGPRSpills` pass. Going to insert a custom pass in the VGPR regalloc pipeline to map the physReg from virtRegMap. In that way, we can reuse the VGPR for any custom SGPR spills during PEI if free lanes are available. However, this regression can only be avoided for higher optimization levels. The `regallocfast`doesn't provide a way to correctly map a virtual to PhysReg and we can't avoid this extra VGPR usage when compiled for -O0.

arsenm added inline comments.Apr 25 2022, 1:44 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of different SGPRs, but either way this is a separate optimization that needs to be re-implemented.

cdevadas added inline comments.Apr 26 2022, 3:45 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	It's worth implementing when it comes to saving a VGPR. Yep, planning it as a separate patch.

Code rebase

Herald added subscribers: kosarev, jsilvanus. · View Herald TranscriptJun 27 2022, 10:05 AM

Harbormaster completed remote builds in B172247: Diff 440292.Jun 27 2022, 11:54 AM

arsenm added inline comments.Jun 28 2022, 11:51 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
442	Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous. However, that should be OK since this is only set and read in PEI so it should be OK. Ideally we would have somewhere else to put it

Added a meaningful comment for SGPRToVGPRCustomSpills.

Harbormaster completed remote builds in B172561: Diff 440730.Jun 28 2022, 2:07 PM

arsenm added inline comments.Jun 28 2022, 3:41 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
449	Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the number of unserialized fields

cdevadas removed a parent revision: D124194: [AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog.Jun 29 2022, 9:10 AM

cdevadas removed a child revision: D124196: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.Jun 29 2022, 9:17 AM

cdevadas added inline comments.Sep 23 2022, 5:37 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
449	Currently, the function `SIMachineFunctionInfo::allocateSGPRSpillToVGPR` needs this variable to choose the num spills between the SILowerSGPRSpills pass and the custom spills later during FrameLowering. I'm planning to move these functions entirely out of SIMachineFunctionInfo and can avoid these variables entirely.

Rebase

Harbormaster completed remote builds in B188383: Diff 462455.Sep 23 2022, 5:39 AM

Ping.

Ping

cdevadas added a parent revision: D124194: [AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog.Oct 27 2022, 11:33 PM

cdevadas added a child revision: D132436: [AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores.

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.
They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.
How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)
Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.
A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
461–470	Simplify this further to a simpler for loop and finally `SGPRToVGPRSpills.clear()`
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
440–441	s/wave index/lane index/?
442	Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it accordingly.

The patch is reasonable in terms of what it does, by the way, just that the code is a mess and I think it should be cleaned up reasonably while it's being touched anyway.

In D124195#3892023, @nhaehnle wrote:

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.

They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.

How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)

Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.

A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

Ideally, D124195 and D132436 are mostly code-refactor and enabler patches for spilling SGPRs into virtual VGPR lanes and they both should have gone in the final patch D124196 that does the spill to virtual VGPRs. I want to use the convention SGPRSpillToVirtVGPRLanes (for SILowerSGPRSpills) and SGPRSpillToPhysVGPRLanes (for SIFrameLowering) for the two maps that track the spill info. But combining them into a single review would make the patch more complex with too many things in one place. So, I have split them into separate reviews. At this point, the SGPR spills at both places go into physical VGPR lanes and I can’t use the aforementioned names for the maps. The original plan was to have a code clean-up after all these patches landed. Yes, SIMachineFunctionInfo is currently in a bad shape. I want to move out the spill related tables and methods and place them into SILowerSGPRSpills and SIFrameLowering passes. Yes, planning to introduce a structure (just like SIMachineFunctionInfo::SGPRToVGPRSpills). I can incorporate all the suggestions you mentioned here in the post-cleanup patch.
At this point, there is a lot of common code for spill handling. But after they become spill to Virtual vs Physical VGPRs, the bookkeeping differs, and we can have a better cleanup.
Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?

In D124195#3892634, @cdevadas wrote:

In D124195#3892023, @nhaehnle wrote:

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.

They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.

How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)

Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.

A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

Ideally, D124195 and D132436 are mostly code-refactor and enabler patches for spilling SGPRs into virtual VGPR lanes and they both should have gone in the final patch D124196 that does the spill to virtual VGPRs. I want to use the convention SGPRSpillToVirtVGPRLanes (for SILowerSGPRSpills) and SGPRSpillToPhysVGPRLanes (for SIFrameLowering) for the two maps that track the spill info. But combining them into a single review would make the patch more complex with too many things in one place. So, I have split them into separate reviews. At this point, the SGPR spills at both places go into physical VGPR lanes and I can’t use the aforementioned names for the maps. The original plan was to have a code clean-up after all these patches landed. Yes, SIMachineFunctionInfo is currently in a bad shape. I want to move out the spill related tables and methods and place them into SILowerSGPRSpills and SIFrameLowering passes. Yes, planning to introduce a structure (just like SIMachineFunctionInfo::SGPRToVGPRSpills). I can incorporate all the suggestions you mentioned here in the post-cleanup patch.
At this point, there is a lot of common code for spill handling. But after they become spill to Virtual vs Physical VGPRs, the bookkeeping differs, and we can have a better cleanup.

Hmm, I suppose we can live with that. I keep wishing for better ways to review patch series like this one. I miss e-mail based reviews :(

Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?

Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?

Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?

That's right.

Removed the prefix "Custom" from the SGPR spills during PrologEpilogInserter and used a meaningful name instead.

Harbormaster completed remote builds in B195426: Diff 472236.Nov 1 2022, 1:45 AM

Made allocateVGPRForSGPRSpills & allocateVGPRForPrologEpilogSGPRSpills methods private.

Harbormaster completed remote builds in B195453: Diff 472278.Nov 1 2022, 6:40 AM

In D124195#3892023, @nhaehnle wrote:

How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)

I believe this is actually an optimization we're regressing on with the switch to spilling to virtual VGPRs. It will need to be reimplemented as a new optimization

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
348–351	Can we defer this until after all the spills are handled?

Deferred adding lane VGPR into BBLiveIns until all SGPR spills are handled.

Harbormaster completed remote builds in B195606: Diff 472474.Nov 1 2022, 6:44 PM

arsenm added inline comments.Nov 1 2022, 6:59 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302	Actually, do we really need to do this anymore? If they were allocated from virtual registers, they should have correct livens lists already

cdevadas added inline comments.Nov 1 2022, 7:27 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302	They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be a MIR verifier error indicating the use of undefined registers in spill instructions.

Ping

arsenm added inline comments.Nov 7 2022, 8:00 AM

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
415	Why the behavior change? Is this restored in a later patch?

cdevadas added inline comments.Nov 7 2022, 8:23 AM

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
415	It's already been discussed. Jay earlier asked about the same in this review. I'm planning a follow-up patch to regain it. Using the VRM map, the unused lanes of the last allocated VGPR virtual register for SGPR spilling can be tracked and can use later during FrameLowering while trying to spill FP/BP.

Ping

arsenm added inline comments.Nov 14 2022, 11:54 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302	This feels too coarse grain. The whole point of doing this was to allocate these like normal virtual registers, which should then have naturally set liveins already. Is this only handling the prolog/epilog cases? It should only need to do anything for those
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
307–314	I think this referenced error cannot happen anymore
349	IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great

cdevadas added inline comments.Nov 14 2022, 10:08 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302	Yes, they are needed only for prolog/epilog spill cases.

Renamed IsPEI to IsPrologEpilog & removed the unwanted comment.

Harbormaster completed remote builds in B197669: Diff 475339.Nov 14 2022, 10:15 PM

arsenm added inline comments.Nov 14 2022, 10:53 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302	But getWWMSpills covers everything? this is adding excess live ins?

cdevadas added inline comments.Nov 14 2022, 11:06 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302	It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR at a block properly adding the IMPLICIT_DEF at its dominator block. The physical VGPR allocated for this virt-VGPR should be added to the prolog block live-ins otherwise verifier would complain about its spill store for using an undefined register.

arsenm accepted this revision.Nov 17 2022, 5:33 PM

This revision is now accepted and ready to land.Nov 17 2022, 5:33 PM

code rebase

Harbormaster completed remote builds in B203143: Diff 482885.Dec 14 2022, 9:06 AM

arsenm accepted this revision.Dec 14 2022, 10:10 AM

This revision was landed with ongoing or failed builds.Dec 16 2022, 10:20 PM

Closed by commit rGb25b4c0ab4ad: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI (authored by cdevadas). · Explain Why

This revision was automatically updated to reflect the committed changes.

cdevadas added a commit: rGb25b4c0ab4ad: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFrameLowering.cpp

50 lines

SILowerSGPRSpills.cpp

2 lines

SIMachineFunctionInfo.h

35 lines

SIMachineFunctionInfo.cpp

133 lines

SIRegisterInfo.cpp

4 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

assert-align.ll

6 lines

call-outgoing-stack-args.ll

24 lines

localizer.ll

6 lines

abi-attribute-hints-undefined-behavior.ll

6 lines

amdpal-callable.ll

20 lines

bf16.ll

142 lines

call-graph-register-usage.ll

16 lines

call-preserved-registers.ll

16 lines

callee-frame-setup.ll

36 lines

cross-block-use-is-not-abi-copy.ll

24 lines

dwarf-multi-register-use-crash.ll

6 lines

frame-setup-without-sgpr-to-vgpr-spills.ll

6 lines

gfx-call-non-gfx-func.ll

8 lines

gfx-callable-argument-types.ll

4415 lines

gfx-callable-preserved-registers.ll

341 lines

gfx-callable-return-types.ll

76 lines

indirect-call.ll

80 lines

mul24-pass-ordering.ll

6 lines

need-fp-from-vgpr-spills.ll

12 lines

nested-calls.ll

6 lines

no-source-locations-in-prologue.ll

6 lines

pei-scavenge-vgpr-spill.mir

6 lines

save-fp.ll

4 lines

sgpr-spills-split-regalloc.ll

8 lines

sibling-call.ll

8 lines

spill-csr-frame-ptr-reg-copy.ll

6 lines

stack-realign.ll

10 lines

tail-call-amdgpu-gfx.ll

4 lines

tuple-allocation-failure.ll

155 lines

unstructured-cfg-def-use-issue.ll

9 lines

vgpr-tuple-allocation.ll

54 lines

wave32.ll

9 lines

wwm-reserved-spill.ll

16 lines

Diff 482885

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static void getVGPRSpillLaneOrTempRegister(MachineFunction &MF,
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();		MachineFrameInfo &FrameInfo = MF.getFrameInfo();

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();

// We need to save and restore the current FP/BP.		// We need to save and restore the current FP/BP.

// 1: If there is already a VGPR with free lanes, use it. We		// 1: Try to save the FP/BP in an unused SGPR.
// may already have to pay the penalty for spilling a CSR VGPR.
if (MFI->haveFreeLanesForSGPRSpill(MF, 1)) {
int NewFI = FrameInfo.CreateStackObject(4, Align(4), true, nullptr,
TargetStackID::SGPRSpill);

if (!MFI->allocateSGPRSpillToVGPR(MF, NewFI))
llvm_unreachable("allocate SGPR spill should have worked");

FrameIndex = NewFI;

LLVM_DEBUG(auto Spill = MFI->getSGPRToVGPRSpills(NewFI).front();
dbgs() << "Spilling " << (IsFP ? "FP" : "BP") << " to "
<< printReg(Spill.VGPR, TRI) << ':' << Spill.Lane
<< '\n');
return;
}

// 2: Next, try to save the FP/BP in an unused SGPR.
TempSGPR = findScratchNonCalleeSaveRegister(		TempSGPR = findScratchNonCalleeSaveRegister(
MF.getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0_XEXECRegClass, true);		MF.getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0_XEXECRegClass, true);

if (!TempSGPR) {		if (!TempSGPR) {
int NewFI = FrameInfo.CreateStackObject(4, Align(4), true, nullptr,		int NewFI = FrameInfo.CreateStackObject(4, Align(4), true, nullptr,
TargetStackID::SGPRSpill);		TargetStackID::SGPRSpill);

if (TRI->spillSGPRToVGPR() && MFI->allocateSGPRSpillToVGPR(MF, NewFI)) {		if (TRI->spillSGPRToVGPR() && MFI->allocateSGPRSpillToVGPRLane(
// 3: There's no free lane to spill, and no free register to save FP/BP,		MF, NewFI, /* IsPrologEpilog */ true)) {
		// 2: There's no free lane to spill, and no free register to save FP/BP,
// so we're forced to spill another VGPR to use for the spill.		// so we're forced to spill another VGPR to use for the spill.
auto Spill = MFI->getSGPRToVGPRSpills(NewFI).front();
MFI->allocateWWMSpill(MF, Spill.VGPR);

FrameIndex = NewFI;		FrameIndex = NewFI;

LLVM_DEBUG(		LLVM_DEBUG(
		auto Spill = MFI->getPrologEpilogSGPRSpillToVGPRLanes(NewFI).front();
dbgs() << (IsFP ? "FP" : "BP") << " requires fallback spill to "		dbgs() << (IsFP ? "FP" : "BP") << " requires fallback spill to "
<< printReg(Spill.VGPR, TRI) << ':' << Spill.Lane << '\n';);		<< printReg(Spill.VGPR, TRI) << ':' << Spill.Lane << '\n';);
} else {		} else {
// Remove dead <NewFI> index		// Remove dead <NewFI> index
MF.getFrameInfo().RemoveStackObject(NewFI);		MF.getFrameInfo().RemoveStackObject(NewFI);
// 4: If all else fails, spill the FP/BP to memory.		// 3: If all else fails, spill the FP/BP to memory.
FrameIndex = FrameInfo.CreateSpillStackObject(4, Align(4));		FrameIndex = FrameInfo.CreateSpillStackObject(4, Align(4));
LLVM_DEBUG(dbgs() << "Reserved FI " << FrameIndex << " for spilling "		LLVM_DEBUG(dbgs() << "Reserved FI " << FrameIndex << " for spilling "
<< (IsFP ? "FP" : "BP") << '\n');		<< (IsFP ? "FP" : "BP") << '\n');
}		}
} else {		} else {
LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "		LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "
<< printReg(TempSGPR, TRI) << '\n');		<< printReg(TempSGPR, TRI) << '\n');
}		}
▲ Show 20 Lines • Show All 696 Lines • ▼ Show 20 Lines	buildPrologSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBB, MBBI, DL, TmpVGPR,
FI);		FI);
};		};

auto SaveSGPRToVGPRLane = [&](Register Reg, const int FI) {		auto SaveSGPRToVGPRLane = [&](Register Reg, const int FI) {
assert(!MFI.isDeadObjectIndex(FI));		assert(!MFI.isDeadObjectIndex(FI));

assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
ArrayRef<SIRegisterInfo::SpilledReg> Spill =		ArrayRef<SIRegisterInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(FI);		FuncInfo->getPrologEpilogSGPRSpillToVGPRLanes(FI);
assert(Spill.size() == 1);		assert(Spill.size() == 1);

BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_WRITELANE_B32), Spill[0].VGPR)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_WRITELANE_B32), Spill[0].VGPR)
.addReg(Reg)		.addReg(Reg)
.addImm(Spill[0].Lane)		.addImm(Spill[0].Lane)
.addReg(Spill[0].VGPR, RegState::Undef);		.addReg(Spill[0].VGPR, RegState::Undef);
};		};

▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	buildEpilogRestore(ST, TRI, *FuncInfo, LiveRegs, MF, MBB, MBBI, DL, TmpVGPR,
FI);		FI);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), Reg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), Reg)
.addReg(TmpVGPR, RegState::Kill);		.addReg(TmpVGPR, RegState::Kill);
};		};

auto RestoreSGPRFromVGPRLane = [&](Register Reg, const int FI) {		auto RestoreSGPRFromVGPRLane = [&](Register Reg, const int FI) {
assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
ArrayRef<SIRegisterInfo::SpilledReg> Spill =		ArrayRef<SIRegisterInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(FI);		FuncInfo->getPrologEpilogSGPRSpillToVGPRLanes(FI);
assert(Spill.size() == 1);		assert(Spill.size() == 1);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), Reg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), Reg)
.addReg(Spill[0].VGPR)		.addReg(Spill[0].VGPR)
.addImm(Spill[0].Lane);		.addImm(Spill[0].Lane);
};		};

if (FPSaveIndex) {		if (FPSaveIndex) {
const int FramePtrFI = *FPSaveIndex;		const int FramePtrFI = *FPSaveIndex;
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : MBB) {
// that won't really need any such special handling.		// that won't really need any such special handling.
if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32)		if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32)
MFI->allocateWWMSpill(MF, MI.getOperand(0).getReg());		MFI->allocateWWMSpill(MF, MI.getOperand(0).getReg());
else if (MI.getOpcode() == AMDGPU::V_READLANE_B32)		else if (MI.getOpcode() == AMDGPU::V_READLANE_B32)
MFI->allocateWWMSpill(MF, MI.getOperand(1).getReg());		MFI->allocateWWMSpill(MF, MI.getOperand(1).getReg());
}		}
}		}

for (MachineBasicBlock &MBB : MF) {
for (auto &Reg : MFI->getWWMSpills())
MBB.addLiveIn(Reg.first);

MBB.sortUniqueLiveIns();
}

// Ignore the SGPRs the default implementation found.		// Ignore the SGPRs the default implementation found.
SavedVGPRs.clearBitsNotInMask(TRI->getAllVectorRegMask());		SavedVGPRs.clearBitsNotInMask(TRI->getAllVectorRegMask());

// Do not save AGPRs prior to GFX90A because there was no easy way to do so.		// Do not save AGPRs prior to GFX90A because there was no easy way to do so.
// In gfx908 there was do AGPR loads and stores and thus spilling also		// In gfx908 there was do AGPR loads and stores and thus spilling also
// require a temporary VGPR.		// require a temporary VGPR.
if (!ST.hasGFX90AInsts())		if (!ST.hasGFX90AInsts())
SavedVGPRs.clearBitsInMask(TRI->getAllAGPRRegMask());		SavedVGPRs.clearBitsInMask(TRI->getAllAGPRRegMask());
Show All 29 Lines	if (TRI->hasBasePointer(MF)) {
if (MFI->SGPRForFPSaveRestoreCopy)		if (MFI->SGPRForFPSaveRestoreCopy)
LiveRegs.addReg(MFI->SGPRForFPSaveRestoreCopy);		LiveRegs.addReg(MFI->SGPRForFPSaveRestoreCopy);

assert(!MFI->SGPRForBPSaveRestoreCopy &&		assert(!MFI->SGPRForBPSaveRestoreCopy &&
!MFI->BasePointerSaveIndex && "Re-reserving spill slot for BP");		!MFI->BasePointerSaveIndex && "Re-reserving spill slot for BP");
getVGPRSpillLaneOrTempRegister(MF, LiveRegs, MFI->SGPRForBPSaveRestoreCopy,		getVGPRSpillLaneOrTempRegister(MF, LiveRegs, MFI->SGPRForBPSaveRestoreCopy,
MFI->BasePointerSaveIndex, false);		MFI->BasePointerSaveIndex, false);
}		}

		// Mark all lane VGPRs as BB LiveIns.
		for (MachineBasicBlock &MBB : MF) {
		for (auto &Reg : MFI->getWWMSpills())
		MBB.addLiveIn(Reg.first);

		MBB.sortUniqueLiveIns();
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Actually, do we really need to do this anymore? If they were allocated from virtual registers, they should have correct livens lists already arsenm: Actually, do we really need to do this anymore? If they were allocated from virtual registers…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be a MIR verifier error indicating the use of undefined registers in spill instructions. cdevadas: They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be…
		arsenmUnsubmitted Not Done Reply Inline Actions This feels too coarse grain. The whole point of doing this was to allocate these like normal virtual registers, which should then have naturally set liveins already. Is this only handling the prolog/epilog cases? It should only need to do anything for those arsenm: This feels too coarse grain. The whole point of doing this was to allocate these like normal…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, they are needed only for prolog/epilog spill cases. cdevadas: Yes, they are needed only for prolog/epilog spill cases.
		arsenmUnsubmitted Not Done Reply Inline Actions But getWWMSpills covers everything? this is adding excess live ins? arsenm: But getWWMSpills covers everything? this is adding excess live ins?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR at a block properly adding the IMPLICIT_DEF at its dominator block. The physical VGPR allocated for this virt-VGPR should be added to the prolog block live-ins otherwise verifier would complain about its spill store for using an undefined register. cdevadas: It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR…
}		}

void SIFrameLowering::determineCalleeSavesSGPR(MachineFunction &MF,		void SIFrameLowering::determineCalleeSavesSGPR(MachineFunction &MF,
BitVector &SavedRegs,		BitVector &SavedRegs,
RegScavenger *RS) const {		RegScavenger *RS) const {
TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);		TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
if (MFI->isEntryFunction())		if (MFI->isEntryFunction())
▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	if (HasSGPRSpillToVGPR) {

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {		for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
if (!TII->isSGPRSpill(MI))		if (!TII->isSGPRSpill(MI))
continue;		continue;

int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();		int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();
assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
if (FuncInfo->allocateSGPRSpillToVGPR(MF, FI)) {		if (FuncInfo->allocateSGPRSpillToVGPRLane(MF, FI)) {
NewReservedRegs = true;		NewReservedRegs = true;
bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(		bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(
MI, FI, nullptr, Indexes, LIS);		MI, FI, nullptr, Indexes, LIS);
(void)Spilled;		(void)Spilled;
assert(Spilled && "failed to spill SGPR to VGPR when allocated");		assert(Spilled && "failed to spill SGPR to VGPR when allocated");
SpillFIs.set(FI);		SpillFIs.set(FI);
}		}
}		}
Show All 40 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines

public:		public:
struct VGPRSpillToAGPR {		struct VGPRSpillToAGPR {
SmallVector<MCPhysReg, 32> Lanes;		SmallVector<MCPhysReg, 32> Lanes;
bool FullyAllocated = false;		bool FullyAllocated = false;
bool IsDead = false;		bool IsDead = false;
};		};

private:		private:
// Track VGPR + wave index for each subregister of the SGPR spilled to		// To track VGPR + lane index for each subregister of the SGPR spilled to
		nhaehnleUnsubmitted Not Done Reply Inline Actions s/wave index/lane index/? nhaehnle: s/wave index/lane index/?
// frameindex key.		// frameindex key during SILowerSGPRSpills pass.
		arsenmUnsubmitted Not Done Reply Inline Actions Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous. However, that should be OK since this is only set and read in PEI so it should be OK. Ideally we would have somewhere else to put it arsenm: Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous.
		nhaehnleUnsubmitted Not Done Reply Inline Actions Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it accordingly. nhaehnle: Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it…
DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRToVGPRSpills;		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRSpillToVGPRLanes;
		// To track VGPR + lane index for spilling special SGPRs like Frame Pointer
		// identified during PrologEpilogInserter.
		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>>
		PrologEpilogSGPRSpillToVGPRLanes;
unsigned NumVGPRSpillLanes = 0;		unsigned NumVGPRSpillLanes = 0;
		unsigned NumVGPRPrologEpilogSpillLanes = 0;
		arsenmUnsubmitted Not Done Reply Inline Actions Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the number of unserialized fields arsenm: Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Currently, the function `SIMachineFunctionInfo::allocateSGPRSpillToVGPR` needs this variable to choose the num spills between the SILowerSGPRSpills pass and the custom spills later during FrameLowering. I'm planning to move these functions entirely out of SIMachineFunctionInfo and can avoid these variables entirely. cdevadas: Currently, the function `SIMachineFunctionInfo::allocateSGPRSpillToVGPR` needs this variable to…
SmallVector<Register, 2> SpillVGPRs;		SmallVector<Register, 2> SpillVGPRs;
using WWMSpillsMap = MapVector<Register, int>;		using WWMSpillsMap = MapVector<Register, int>;
// To track the registers used in instructions that can potentially modify the		// To track the registers used in instructions that can potentially modify the
// inactive lanes. The WWM instructions and the writelane instructions for		// inactive lanes. The WWM instructions and the writelane instructions for
// spilling SGPRs to VGPRs fall under such category of operations. The VGPRs		// spilling SGPRs to VGPRs fall under such category of operations. The VGPRs
// modified by them should be spilled/restored at function prolog/epilog to		// modified by them should be spilled/restored at function prolog/epilog to
// avoid any undesired outcome. Each entry in this map holds a pair of values,		// avoid any undesired outcome. Each entry in this map holds a pair of values,
// the VGPR and its stack slot index.		// the VGPR and its stack slot index.
Show All 16 Lines	private:

// Emergency stack slot. Sometimes, we create this before finalizing the stack		// Emergency stack slot. Sometimes, we create this before finalizing the stack
// frame, so save it here and add it to the RegScavenger later.		// frame, so save it here and add it to the RegScavenger later.
std::optional<int> ScavengeFI;		std::optional<int> ScavengeFI;

private:		private:
Register VGPRForAGPRCopy;		Register VGPRForAGPRCopy;

		bool allocateVGPRForSGPRSpills(MachineFunction &MF, int FI,
		unsigned LaneIndex);
		bool allocateVGPRForPrologEpilogSGPRSpills(MachineFunction &MF, int FI,
		unsigned LaneIndex);

public:		public:
Register getVGPRForAGPRCopy() const {		Register getVGPRForAGPRCopy() const {
return VGPRForAGPRCopy;		return VGPRForAGPRCopy;
}		}

void setVGPRForAGPRCopy(Register NewVGPRForAGPRCopy) {		void setVGPRForAGPRCopy(Register NewVGPRForAGPRCopy) {
VGPRForAGPRCopy = NewVGPRForAGPRCopy;		VGPRForAGPRCopy = NewVGPRForAGPRCopy;
}		}
Show All 27 Lines	public:

void reserveWWMRegister(Register Reg) { WWMReservedRegs.insert(Reg); }		void reserveWWMRegister(Register Reg) { WWMReservedRegs.insert(Reg); }

AMDGPU::SIModeRegisterDefaults getMode() const {		AMDGPU::SIModeRegisterDefaults getMode() const {
return Mode;		return Mode;
}		}

ArrayRef<SIRegisterInfo::SpilledReg>		ArrayRef<SIRegisterInfo::SpilledReg>
getSGPRToVGPRSpills(int FrameIndex) const {		getSGPRSpillToVGPRLanes(int FrameIndex) const {
auto I = SGPRToVGPRSpills.find(FrameIndex);		auto I = SGPRSpillToVGPRLanes.find(FrameIndex);
return (I == SGPRToVGPRSpills.end())		return (I == SGPRSpillToVGPRLanes.end())
? ArrayRef<SIRegisterInfo::SpilledReg>()		? ArrayRef<SIRegisterInfo::SpilledReg>()
: makeArrayRef(I->second);		: makeArrayRef(I->second);
}		}

ArrayRef<Register> getSGPRSpillVGPRs() const { return SpillVGPRs; }		ArrayRef<Register> getSGPRSpillVGPRs() const { return SpillVGPRs; }
const WWMSpillsMap &getWWMSpills() const { return WWMSpills; }		const WWMSpillsMap &getWWMSpills() const { return WWMSpills; }
const ReservedRegSet &getWWMReservedRegs() const { return WWMReservedRegs; }		const ReservedRegSet &getWWMReservedRegs() const { return WWMReservedRegs; }

		ArrayRef<SIRegisterInfo::SpilledReg>
		getPrologEpilogSGPRSpillToVGPRLanes(int FrameIndex) const {
		auto I = PrologEpilogSGPRSpillToVGPRLanes.find(FrameIndex);
		return (I == PrologEpilogSGPRSpillToVGPRLanes.end())
		? ArrayRef<SIRegisterInfo::SpilledReg>()
		: makeArrayRef(I->second);
		}

void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,		void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,
Align Alignment = Align(4));		Align Alignment = Align(4));

ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {		ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {
return SpillAGPR;		return SpillAGPR;
}		}

ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {		ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {
return SpillVGPR;		return SpillVGPR;
}		}

MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {		MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {
auto I = VGPRToAGPRSpills.find(FrameIndex);		auto I = VGPRToAGPRSpills.find(FrameIndex);
return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister		return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister
: I->second.Lanes[Lane];		: I->second.Lanes[Lane];
}		}

void setVGPRToAGPRSpillDead(int FrameIndex) {		void setVGPRToAGPRSpillDead(int FrameIndex) {
auto I = VGPRToAGPRSpills.find(FrameIndex);		auto I = VGPRToAGPRSpills.find(FrameIndex);
if (I != VGPRToAGPRSpills.end())		if (I != VGPRToAGPRSpills.end())
I->second.IsDead = true;		I->second.IsDead = true;
}		}

bool haveFreeLanesForSGPRSpill(const MachineFunction &MF,		bool allocateSGPRSpillToVGPRLane(MachineFunction &MF, int FI,
unsigned NumLane) const;		bool IsPrologEpilog = false);
bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI);
bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);		bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);

/// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill		/// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill
/// to the default stack.		/// to the default stack.
bool removeDeadFrameIndices(MachineFrameInfo &MFI,		bool removeDeadFrameIndices(MachineFrameInfo &MFI,
bool ResetSGPRSpillStackIDs);		bool ResetSGPRSpillStackIDs);

int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);		int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);
▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	bool SIMachineFunctionInfo::isCalleeSavedReg(const MCPhysReg *CSRegs,
for (unsigned I = 0; CSRegs[I]; ++I) {		for (unsigned I = 0; CSRegs[I]; ++I) {
if (CSRegs[I] == Reg)		if (CSRegs[I] == Reg)
return true;		return true;
}		}

return false;		return false;
}		}

/// \p returns true if \p NumLanes slots are available in VGPRs already used for		bool SIMachineFunctionInfo::allocateVGPRForSGPRSpills(MachineFunction &MF,
/// SGPR spilling.		int FI,
//		unsigned LaneIndex) {
// FIXME: This only works after processFunctionBeforeFrameFinalized
bool SIMachineFunctionInfo::haveFreeLanesForSGPRSpill(const MachineFunction &MF,
unsigned NumNeed) const {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
unsigned WaveSize = ST.getWavefrontSize();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
return NumVGPRSpillLanes + NumNeed <= WaveSize * SpillVGPRs.size();		MachineRegisterInfo &MRI = MF.getRegInfo();
		Register LaneVGPR;
		if (!LaneIndex) {
		LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
		if (LaneVGPR == AMDGPU::NoRegister) {
		// We have no VGPRs left for spilling SGPRs. Reset because we will not
		// partially spill the SGPR to VGPRs.
		SGPRSpillToVGPRLanes.erase(FI);
		return false;
		}

		SpillVGPRs.push_back(LaneVGPR);
		// Add this register as live-in to all blocks to avoid machine verifier
		// complaining about use of an undefined physical register.
		for (MachineBasicBlock &BB : MF)
		BB.addLiveIn(LaneVGPR);
		} else {
		arsenmUnsubmitted Not Done Reply Inline Actions I think this referenced error cannot happen anymore arsenm: I think this referenced error cannot happen anymore
		LaneVGPR = SpillVGPRs.back();
}		}

/// Reserve a slice of a VGPR to support spilling for FrameIndex \p FI.		SGPRSpillToVGPRLanes[FI].push_back(
bool SIMachineFunctionInfo::allocateSGPRSpillToVGPR(MachineFunction &MF,		SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));
int FI) {		return true;
std::vector<SIRegisterInfo::SpilledReg> &SpillLanes = SGPRToVGPRSpills[FI];		}

		bool SIMachineFunctionInfo::allocateVGPRForPrologEpilogSGPRSpills(
		MachineFunction &MF, int FI, unsigned LaneIndex) {
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIRegisterInfo *TRI = ST.getRegisterInfo();
		MachineRegisterInfo &MRI = MF.getRegInfo();
		Register LaneVGPR;
		if (!LaneIndex) {
		LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
		if (LaneVGPR == AMDGPU::NoRegister) {
		// We have no VGPRs left for spilling SGPRs. Reset because we will not
		// partially spill the SGPR to VGPRs.
		PrologEpilogSGPRSpillToVGPRLanes.erase(FI);
		return false;
		}

		allocateWWMSpill(MF, LaneVGPR);
		} else {
		LaneVGPR = WWMSpills.back().first;
		}

		PrologEpilogSGPRSpillToVGPRLanes[FI].push_back(
		SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));
		return true;
		}

		bool SIMachineFunctionInfo::allocateSGPRSpillToVGPRLane(MachineFunction &MF,
		int FI,
		arsenmUnsubmitted Not Done Reply Inline Actions IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great arsenm: IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great
		bool IsPrologEpilog) {
		std::vector<SIRegisterInfo::SpilledReg> &SpillLanes =
		arsenmUnsubmitted Not Done Reply Inline Actions Can we defer this until after all the spills are handled? arsenm: Can we defer this until after all the spills are handled?
		IsPrologEpilog ? PrologEpilogSGPRSpillToVGPRLanes[FI]
		: SGPRSpillToVGPRLanes[FI];

// This has already been allocated.		// This has already been allocated.
if (!SpillLanes.empty())		if (!SpillLanes.empty())
return true;		return true;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();
unsigned WaveSize = ST.getWavefrontSize();		unsigned WaveSize = ST.getWavefrontSize();

unsigned Size = FrameInfo.getObjectSize(FI);		unsigned Size = FrameInfo.getObjectSize(FI);
unsigned NumLanes = Size / 4;		unsigned NumLanes = Size / 4;

if (NumLanes > WaveSize)		if (NumLanes > WaveSize)
return false;		return false;

assert(Size >= 4 && "invalid sgpr spill size");		assert(Size >= 4 && "invalid sgpr spill size");
assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");		assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");

// Make sure to handle the case where a wide SGPR spill may span between two		unsigned &NumSpillLanes =
// VGPRs.		IsPrologEpilog ? NumVGPRPrologEpilogSpillLanes : NumVGPRSpillLanes;
for (unsigned I = 0; I < NumLanes; ++I, ++NumVGPRSpillLanes) {
Register LaneVGPR;
unsigned VGPRIndex = (NumVGPRSpillLanes % WaveSize);

if (VGPRIndex == 0) {		for (unsigned I = 0; I < NumLanes; ++I, ++NumSpillLanes) {
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		unsigned LaneIndex = (NumSpillLanes % WaveSize);
if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.
SGPRToVGPRSpills.erase(FI);
NumVGPRSpillLanes -= I;

// FIXME: We can run out of free registers with split allocation if		bool Allocated =
// IPRA is enabled and a called function already uses every VGPR.		IsPrologEpilog
#if 0		? allocateVGPRForPrologEpilogSGPRSpills(MF, FI, LaneIndex)
DiagnosticInfoResourceLimit DiagOutOfRegs(MF.getFunction(),		: allocateVGPRForSGPRSpills(MF, FI, LaneIndex);
"VGPRs for SGPR spilling",		if (!Allocated) {
0, DS_Error);		NumSpillLanes -= I;
MF.getFunction().getContext().diagnose(DiagOutOfRegs);
#endif
return false;		return false;
}		}

SpillVGPRs.push_back(LaneVGPR);

// Add this register as live-in to all blocks to avoid machine verifier
// complaining about use of an undefined physical register.
for (MachineBasicBlock &BB : MF)
BB.addLiveIn(LaneVGPR);
} else {
LaneVGPR = SpillVGPRs.back();
}

SpillLanes.push_back(SIRegisterInfo::SpilledReg(LaneVGPR, VGPRIndex));
}		}

return true;		return true;
}		}

/// Reserve AGPRs or VGPRs to support spilling for FrameIndex \p FI.		/// Reserve AGPRs or VGPRs to support spilling for FrameIndex \p FI.
/// Either AGPR is spilled to VGPR to vice versa.		/// Either AGPR is spilled to VGPR to vice versa.
/// Returns true if a \p FI can be eliminated completely.		/// Returns true if a \p FI can be eliminated completely.
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	for (int I = NumLanes - 1; I >= 0; --I) {
SpillRegs.push_back(*NextSpillReg);		SpillRegs.push_back(*NextSpillReg);
Spill.Lanes[I] = *NextSpillReg++;		Spill.Lanes[I] = *NextSpillReg++;
}		}

return Spill.FullyAllocated;		return Spill.FullyAllocated;
}		}

bool SIMachineFunctionInfo::removeDeadFrameIndices(		bool SIMachineFunctionInfo::removeDeadFrameIndices(
MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {		MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {
// Remove dead frame indices from function frame, however keep FP & BP since		// Remove dead frame indices from function frame. And also make sure to remove
// spills for them haven't been inserted yet. And also make sure to remove the		// the frame indices from `SGPRSpillToVGPRLanes` data structure, otherwise, it
// frame indices from `SGPRToVGPRSpills` data structure, otherwise, it could		// could result in an unexpected side effect and bug, in case of any
// result in an unexpected side effect and bug, in case of any re-mapping of		// re-mapping of freed frame indices by later pass(es) like "stack slot
// freed frame indices by later pass(es) like "stack slot coloring".		// coloring".
for (auto &R : make_early_inc_range(SGPRToVGPRSpills)) {		for (auto &R : make_early_inc_range(SGPRSpillToVGPRLanes)) {
if (R.first != FramePointerSaveIndex && R.first != BasePointerSaveIndex) {
MFI.RemoveStackObject(R.first);		MFI.RemoveStackObject(R.first);
SGPRToVGPRSpills.erase(R.first);		SGPRSpillToVGPRLanes.erase(R.first);
}
}		}
		nhaehnleUnsubmitted Not Done Reply Inline Actions Simplify this further to a simpler for loop and finally `SGPRToVGPRSpills.clear()` nhaehnle: Simplify this further to a simpler for loop and finally `SGPRToVGPRSpills.clear()`

bool HaveSGPRToMemory = false;		bool HaveSGPRToMemory = false;

if (ResetSGPRSpillStackIDs) {		if (ResetSGPRSpillStackIDs) {
// All other SPGRs must be allocated on the default stack, so reset the		// All other SPGRs must be allocated on the default stack, so reset the
// stack ID.		// stack ID.
for (int i = MFI.getObjectIndexBegin(), e = MFI.getObjectIndexEnd(); i != e;		for (int i = MFI.getObjectIndexBegin(), e = MFI.getObjectIndexEnd(); i != e;
++i) {		++i) {
▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 1,699 Lines • ▼ Show 20 Lines	void SIRegisterInfo::buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index,
}		}
}		}

bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI, int Index,		bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI, int Index,
RegScavenger RS, SlotIndexes Indexes,		RegScavenger RS, SlotIndexes Indexes,
LiveIntervals *LIS, bool OnlyToVGPR) const {		LiveIntervals *LIS, bool OnlyToVGPR) const {
SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);		SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);

ArrayRef<SpilledReg> VGPRSpills = SB.MFI.getSGPRToVGPRSpills(Index);		ArrayRef<SpilledReg> VGPRSpills = SB.MFI.getSGPRSpillToVGPRLanes(Index);
bool SpillToVGPR = !VGPRSpills.empty();		bool SpillToVGPR = !VGPRSpills.empty();
if (OnlyToVGPR && !SpillToVGPR)		if (OnlyToVGPR && !SpillToVGPR)
return false;		return false;

assert(SpillToVGPR \|\| (SB.SuperReg != SB.MFI.getStackPtrOffsetReg() &&		assert(SpillToVGPR \|\| (SB.SuperReg != SB.MFI.getStackPtrOffsetReg() &&
SB.SuperReg != SB.MFI.getFrameOffsetReg()));		SB.SuperReg != SB.MFI.getFrameOffsetReg()));

if (SpillToVGPR) {		if (SpillToVGPR) {
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI, int Index,
return true;		return true;
}		}

bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI, int Index,		bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI, int Index,
RegScavenger RS, SlotIndexes Indexes,		RegScavenger RS, SlotIndexes Indexes,
LiveIntervals *LIS, bool OnlyToVGPR) const {		LiveIntervals *LIS, bool OnlyToVGPR) const {
SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);		SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);

ArrayRef<SpilledReg> VGPRSpills = SB.MFI.getSGPRToVGPRSpills(Index);		ArrayRef<SpilledReg> VGPRSpills = SB.MFI.getSGPRSpillToVGPRLanes(Index);
bool SpillToVGPR = !VGPRSpills.empty();		bool SpillToVGPR = !VGPRSpills.empty();
if (OnlyToVGPR && !SpillToVGPR)		if (OnlyToVGPR && !SpillToVGPR)
return false;		return false;

if (SpillToVGPR) {		if (SpillToVGPR) {
for (unsigned i = 0, e = SB.NumSubRegs; i < e; ++i) {		for (unsigned i = 0, e = SB.NumSubRegs; i < e; ++i) {
Register SubReg =		Register SubReg =
SB.NumSubRegs == 1		SB.NumSubRegs == 1
▲ Show 20 Lines • Show All 1,429 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s

	declare hidden ptr addrspace(1) @ext(ptr addrspace(1))			declare hidden ptr addrspace(1) @ext(ptr addrspace(1))

	define ptr addrspace(1) @call_assert_align() {			define ptr addrspace(1) @call_assert_align() {
	; CHECK-LABEL: call_assert_align:			; CHECK-LABEL: call_assert_align:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				foadUnsubmitted Not Done Reply Inline Actions Seems like a regression. Does this get fixed by a later patch? foad: Seems like a regression. Does this get fixed by a later patch?
				cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, it is. With spilling SGPRs into virtual VPGR lanes, it won't directly be possible to track the unused lanes of the physical VGPR allocated for the last virtual register created during `SILowerSGPRSpills` pass. Going to insert a custom pass in the VGPR regalloc pipeline to map the physReg from virtRegMap. In that way, we can reuse the VGPR for any custom SGPR spills during PEI if free lanes are available. However, this regression can only be avoided for higher optimization levels. The `regallocfast`doesn't provide a way to correctly map a virtual to PhysReg and we can't avoid this extra VGPR usage when compiled for -O0. cdevadas: Yes, it is. With spilling SGPRs into virtual VPGR lanes, it won't directly be possible to track…
				arsenmUnsubmitted Not Done Reply Inline Actions I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of different SGPRs, but either way this is a separate optimization that needs to be re-implemented. arsenm: I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of…
				cdevadasAuthorUnsubmitted Done Reply Inline Actions It's worth implementing when it comes to saving a VGPR. Yep, planning it as a separate patch. cdevadas: It's worth implementing when it comes to saving a VGPR. Yep, planning it as a separate patch.
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v40, s33, 2			; CHECK-NEXT: v_writelane_b32 v41, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, ext@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, ext@rel32@hi+12
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: v_mov_b32_e32 v2, 0			; CHECK-NEXT: v_mov_b32_e32 v2, 0
	; CHECK-NEXT: global_store_dword v[0:1], v2, off			; CHECK-NEXT: global_store_dword v[0:1], v2, off
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: v_readlane_b32 s33, v40, 2			; CHECK-NEXT: v_readlane_b32 s33, v41, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call align 4 ptr addrspace(1) @ext(ptr addrspace(1) null)			%call = call align 4 ptr addrspace(1) @ext(ptr addrspace(1) null)
	store volatile i32 0, ptr addrspace(1) %call			store volatile i32 0, ptr addrspace(1) %call
	ret ptr addrspace(1) %call			ret ptr addrspace(1) %call
	}			}
	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

	Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	}			}

	define void @func_caller_stack() {			define void @func_caller_stack() {
	; MUBUF-LABEL: func_caller_stack:			; MUBUF-LABEL: func_caller_stack:
	; MUBUF: ; %bb.0:			; MUBUF: ; %bb.0:
	; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: v_writelane_b32 v40, s33, 2			; MUBUF-NEXT: v_writelane_b32 v41, s33, 0
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_mov_b32_e32 v0, 9			; MUBUF-NEXT: v_mov_b32_e32 v0, 9
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; MUBUF-NEXT: v_mov_b32_e32 v0, 10			; MUBUF-NEXT: v_mov_b32_e32 v0, 10
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; MUBUF-NEXT: v_mov_b32_e32 v0, 11			; MUBUF-NEXT: v_mov_b32_e32 v0, 11
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; MUBUF-NEXT: v_mov_b32_e32 v0, 12			; MUBUF-NEXT: v_mov_b32_e32 v0, 12
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
	; MUBUF-NEXT: v_readlane_b32 s33, v40, 2			; MUBUF-NEXT: v_readlane_b32 s33, v41, 0
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; MUBUF-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_stack:			; FLATSCR-LABEL: func_caller_stack:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: v_writelane_b32 v40, s33, 2			; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 9			; FLATSCR-NEXT: v_mov_b32_e32 v0, 9
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 10			; FLATSCR-NEXT: v_mov_b32_e32 v0, 10
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 11			; FLATSCR-NEXT: v_mov_b32_e32 v0, 11
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 12			; FLATSCR-NEXT: v_mov_b32_e32 v0, 12
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:16			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:16
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_add_i32 s32, s32, -16
	; FLATSCR-NEXT: v_readlane_b32 s33, v40, 2			; FLATSCR-NEXT: v_readlane_b32 s33, v41, 0
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
				; FLATSCR-NEXT: scratch_load_dword v41, off, s32 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)			call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)
	ret void			ret void
	}			}

	define void @func_caller_byval(ptr addrspace(5) %argptr) {			define void @func_caller_byval(ptr addrspace(5) %argptr) {
	; MUBUF-LABEL: func_caller_byval:			; MUBUF-LABEL: func_caller_byval:
	; MUBUF: ; %bb.0:			; MUBUF: ; %bb.0:
	; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen
	; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4
	; MUBUF-NEXT: v_writelane_b32 v40, s33, 2			; MUBUF-NEXT: v_writelane_b32 v41, s33, 0
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:56			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:56
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
	; MUBUF-NEXT: v_readlane_b32 s33, v40, 2			; MUBUF-NEXT: v_readlane_b32 s33, v41, 0
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; MUBUF-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_byval:			; FLATSCR-LABEL: func_caller_byval:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off
	; FLATSCR-NEXT: v_writelane_b32 v40, s33, 2			; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	Show All 18 Lines
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:48			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:48
	; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], v0, off offset:56			; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], v0, off offset:56
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s32 offset:56			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s32 offset:56
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_add_i32 s32, s32, -16
	; FLATSCR-NEXT: v_readlane_b32 s33, v40, 2			; FLATSCR-NEXT: v_readlane_b32 s33, v41, 0
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
				; FLATSCR-NEXT: scratch_load_dword v41, off, s32 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_byval(ptr addrspace(5) byval([16 x i32]) %argptr)			call void @external_void_func_byval(ptr addrspace(5) byval([16 x i32]) %argptr)
	ret void			ret void
	}			}

	declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #1			declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #1

	attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }			attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }
	attributes #1 = { argmemonly nofree nounwind willreturn writeonly }			attributes #1 = { argmemonly nofree nounwind willreturn writeonly }

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

	Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines

	; This would crash from using the wrong insert point			; This would crash from using the wrong insert point
	define void @sink_null_insert_pt(ptr addrspace(4) %arg0) {			define void @sink_null_insert_pt(ptr addrspace(4) %arg0) {
	; GFX9-LABEL: sink_null_insert_pt:			; GFX9-LABEL: sink_null_insert_pt:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[16:17], -1			; GFX9-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[16:17]			; GFX9-NEXT: s_mov_b64 exec, s[16:17]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dword v0, v[0:1], off glc			; GFX9-NEXT: global_load_dword v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], 0			; GFX9-NEXT: s_swappc_b64 s[30:31], 0
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%load0 = load volatile i32, ptr addrspace(1) null, align 4			%load0 = load volatile i32, ptr addrspace(1) null, align 4
	br label %bb1			br label %bb1

	bb1:			bb1:
	call void null()			call void null()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

	Show All 13 Lines
	; does not require the implicit arguments to the function. Make sure			; does not require the implicit arguments to the function. Make sure
	; we do not crash.			; we do not crash.
	define void @parent_func_missing_inputs() #0 {			define void @parent_func_missing_inputs() #0 {
	; FIXEDABI-LABEL: parent_func_missing_inputs:			; FIXEDABI-LABEL: parent_func_missing_inputs:
	; FIXEDABI: ; %bb.0:			; FIXEDABI: ; %bb.0:
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[16:17], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[16:17], -1
	; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; FIXEDABI-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: s_mov_b64 exec, s[16:17]			; FIXEDABI-NEXT: s_mov_b64 exec, s[16:17]
	; FIXEDABI-NEXT: v_writelane_b32 v40, s33, 2			; FIXEDABI-NEXT: v_writelane_b32 v41, s33, 0
	; FIXEDABI-NEXT: s_mov_b32 s33, s32			; FIXEDABI-NEXT: s_mov_b32 s33, s32
	; FIXEDABI-NEXT: s_addk_i32 s32, 0x400			; FIXEDABI-NEXT: s_addk_i32 s32, 0x400
	; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0			; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0
	; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1			; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1
	; FIXEDABI-NEXT: s_getpc_b64 s[16:17]			; FIXEDABI-NEXT: s_getpc_b64 s[16:17]
	; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4			; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4
	; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12			; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12
	; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]			; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1			; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1
	; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0			; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0
	; FIXEDABI-NEXT: s_addk_i32 s32, 0xfc00			; FIXEDABI-NEXT: s_addk_i32 s32, 0xfc00
	; FIXEDABI-NEXT: v_readlane_b32 s33, v40, 2			; FIXEDABI-NEXT: v_readlane_b32 s33, v41, 0
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[4:5], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[4:5], -1
	; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; FIXEDABI-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; FIXEDABI-NEXT: s_mov_b64 exec, s[4:5]			; FIXEDABI-NEXT: s_mov_b64 exec, s[4:5]
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0)
	; FIXEDABI-NEXT: s_setpc_b64 s[30:31]			; FIXEDABI-NEXT: s_setpc_b64 s[30:31]
	call void @requires_all_inputs()			call void @requires_all_inputs()
	ret void			ret void
	}			}

	define amdgpu_kernel void @parent_kernel_missing_inputs() #0 {			define amdgpu_kernel void @parent_kernel_missing_inputs() #0 {
	▲ Show 20 Lines • Show All 345 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

	Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; GCN-NEXT: .vgpr_count: 0x3{{$}}			; GCN-NEXT: .vgpr_count: 0x3{{$}}
	; GCN-NEXT: no_stack:			; GCN-NEXT: no_stack:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x20{{$}}			; GCN-NEXT: .sgpr_count: 0x20{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: .vgpr_count: 0x1{{$}}			; GCN-NEXT: .vgpr_count: 0x1{{$}}
	; GCN-NEXT: no_stack_call:			; GCN-NEXT: no_stack_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x24{{$}}			; GCN-NEXT: .sgpr_count: 0x25{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x3{{$}}			; GCN-NEXT: .vgpr_count: 0x3{{$}}
	; GCN-NEXT: no_stack_extern_call:			; GCN-NEXT: no_stack_extern_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: no_stack_extern_call_many_args:			; GCN-NEXT: no_stack_extern_call_many_args:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: no_stack_indirect_call:			; GCN-NEXT: no_stack_indirect_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: simple_lds:			; GCN-NEXT: simple_lds:
	; GCN-NEXT: .lds_size: 0x100{{$}}			; GCN-NEXT: .lds_size: 0x100{{$}}
	; GCN-NEXT: .sgpr_count: 0x20{{$}}			; GCN-NEXT: .sgpr_count: 0x20{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: .vgpr_count: 0x1{{$}}			; GCN-NEXT: .vgpr_count: 0x1{{$}}
	; GCN-NEXT: simple_lds_recurse:			; GCN-NEXT: simple_lds_recurse:
	; GCN-NEXT: .lds_size: 0x100{{$}}			; GCN-NEXT: .lds_size: 0x100{{$}}
	; GCN-NEXT: .sgpr_count: 0x26{{$}}			; GCN-NEXT: .sgpr_count: 0x26{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x29{{$}}			; GCN-NEXT: .vgpr_count: 0x2a{{$}}
	; GCN-NEXT: simple_stack:			; GCN-NEXT: simple_stack:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x21{{$}}			; GCN-NEXT: .sgpr_count: 0x21{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}
	; GCN-NEXT: .vgpr_count: 0x2{{$}}			; GCN-NEXT: .vgpr_count: 0x2{{$}}
	; GCN-NEXT: simple_stack_call:			; GCN-NEXT: simple_stack_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x24{{$}}			; GCN-NEXT: .sgpr_count: 0x25{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x4{{$}}			; GCN-NEXT: .vgpr_count: 0x4{{$}}
	; GCN-NEXT: simple_stack_extern_call:			; GCN-NEXT: simple_stack_extern_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: simple_stack_indirect_call:			; GCN-NEXT: simple_stack_indirect_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x30{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: simple_stack_recurse:			; GCN-NEXT: simple_stack_recurse:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x26{{$}}			; GCN-NEXT: .sgpr_count: 0x26{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x2a{{$}}			; GCN-NEXT: .vgpr_count: 0x2b{{$}}
	; GCN-NEXT: ...			; GCN-NEXT: ...

llvm/test/CodeGen/AMDGPU/bf16.ll

	Show First 20 Lines • Show All 1,414 Lines • ▼ Show 20 Lines

	define void @test_call(bfloat %in, ptr addrspace(5) %out) {			define void @test_call(bfloat %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call:			; GCN-LABEL: test_call:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: v_writelane_b32 v2, s33, 2
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v2, s30, 0			; GCN-NEXT: v_writelane_b32 v2, s30, 0
	; GCN-NEXT: v_writelane_b32 v2, s31, 1			; GCN-NEXT: v_writelane_b32 v2, s31, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GCN-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v2, 1			; GCN-NEXT: v_readlane_b32 s31, v2, 1
	; GCN-NEXT: v_readlane_b32 s30, v2, 0			; GCN-NEXT: v_readlane_b32 s30, v2, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v2, 2			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call:			; GFX7-LABEL: test_call:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: v_writelane_b32 v2, s33, 2			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX7-NEXT: v_writelane_b32 v2, s30, 0			; GFX7-NEXT: v_writelane_b32 v2, s30, 0
	; GFX7-NEXT: v_writelane_b32 v2, s31, 1			; GFX7-NEXT: v_writelane_b32 v2, s31, 1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX7-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v2, 1			; GFX7-NEXT: v_readlane_b32 s31, v2, 1
	; GFX7-NEXT: v_readlane_b32 s30, v2, 0			; GFX7-NEXT: v_readlane_b32 s30, v2, 0
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: v_readlane_b32 s33, v2, 2			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call:			; GFX8-LABEL: test_call:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: v_writelane_b32 v2, s33, 2			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v2, s30, 0			; GFX8-NEXT: v_writelane_b32 v2, s30, 0
	; GFX8-NEXT: v_writelane_b32 v2, s31, 1			; GFX8-NEXT: v_writelane_b32 v2, s31, 1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX8-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX8-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX8-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v2, 1			; GFX8-NEXT: v_readlane_b32 s31, v2, 1
	; GFX8-NEXT: v_readlane_b32 s30, v2, 0			; GFX8-NEXT: v_readlane_b32 s30, v2, 0
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: v_readlane_b32 s33, v2, 2			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call:			; GFX9-LABEL: test_call:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v2, s33, 2			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v2, 2			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call:			; GFX10-LABEL: test_call:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v2, s33, 2			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v2, 2			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call bfloat @test_arg_store(bfloat %in)			%result = call bfloat @test_arg_store(bfloat %in)
	store volatile bfloat %result, ptr addrspace(5) %out			store volatile bfloat %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v2bf16(<2 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v2bf16(<2 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v2bf16:			; GCN-LABEL: test_call_v2bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: v_writelane_b32 v3, s33, 2
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v3, s30, 0			; GCN-NEXT: v_writelane_b32 v3, s30, 0
	; GCN-NEXT: v_writelane_b32 v3, s31, 1			; GCN-NEXT: v_writelane_b32 v3, s31, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GCN-NEXT: v_add_i32_e32 v4, vcc, 2, v2			; GCN-NEXT: v_add_i32_e32 v4, vcc, 2, v2
	; GCN-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v3, 1			; GCN-NEXT: v_readlane_b32 s31, v3, 1
	; GCN-NEXT: v_readlane_b32 s30, v3, 0			; GCN-NEXT: v_readlane_b32 s30, v3, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v3, 2			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v2bf16:			; GFX7-LABEL: test_call_v2bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: v_writelane_b32 v3, s33, 2			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX7-NEXT: v_writelane_b32 v3, s30, 0			; GFX7-NEXT: v_writelane_b32 v3, s30, 0
	; GFX7-NEXT: v_writelane_b32 v3, s31, 1			; GFX7-NEXT: v_writelane_b32 v3, s31, 1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX7-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; GFX7-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GFX7-NEXT: v_add_i32_e32 v4, vcc, 2, v2			; GFX7-NEXT: v_add_i32_e32 v4, vcc, 2, v2
	; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX7-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v3, 1			; GFX7-NEXT: v_readlane_b32 s31, v3, 1
	; GFX7-NEXT: v_readlane_b32 s30, v3, 0			; GFX7-NEXT: v_readlane_b32 s30, v3, 0
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: v_readlane_b32 s33, v3, 2			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v2bf16:			; GFX8-LABEL: test_call_v2bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: v_writelane_b32 v2, s33, 2			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v2, s30, 0			; GFX8-NEXT: v_writelane_b32 v2, s30, 0
	; GFX8-NEXT: v_writelane_b32 v2, s31, 1			; GFX8-NEXT: v_writelane_b32 v2, s31, 1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX8-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v2, 1			; GFX8-NEXT: v_readlane_b32 s31, v2, 1
	; GFX8-NEXT: v_readlane_b32 s30, v2, 0			; GFX8-NEXT: v_readlane_b32 s30, v2, 0
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: v_readlane_b32 s33, v2, 2			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v2bf16:			; GFX9-LABEL: test_call_v2bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v2, s33, 2			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v2, 2			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v2bf16:			; GFX10-LABEL: test_call_v2bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v2, s33, 2			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v2, 2			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <2 x bfloat> @test_arg_store_v2bf16(<2 x bfloat> %in)			%result = call <2 x bfloat> @test_arg_store_v2bf16(<2 x bfloat> %in)
	store volatile <2 x bfloat> %result, ptr addrspace(5) %out			store volatile <2 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v3bf16(<3 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v3bf16(<3 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v3bf16:			; GCN-LABEL: test_call_v3bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: v_writelane_b32 v4, s33, 2
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v4, s30, 0			; GCN-NEXT: v_writelane_b32 v4, s30, 0
	; GCN-NEXT: v_writelane_b32 v4, s31, 1			; GCN-NEXT: v_writelane_b32 v4, s31, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v2			; GCN-NEXT: v_lshrrev_b32_e32 v2, 16, v2
	; GCN-NEXT: v_add_i32_e32 v5, vcc, 4, v3			; GCN-NEXT: v_add_i32_e32 v5, vcc, 4, v3
	; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16			; GCN-NEXT: v_alignbit_b32 v0, v1, v0, 16
	; GCN-NEXT: buffer_store_short v2, v5, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v2, v5, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen			; GCN-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v4, 1			; GCN-NEXT: v_readlane_b32 s31, v4, 1
	; GCN-NEXT: v_readlane_b32 s30, v4, 0			; GCN-NEXT: v_readlane_b32 s30, v4, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v4, 2			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v3bf16:			; GFX7-LABEL: test_call_v3bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v4, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v4, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: v_writelane_b32 v4, s33, 2			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX7-NEXT: v_writelane_b32 v4, s30, 0			; GFX7-NEXT: v_writelane_b32 v4, s30, 0
	; GFX7-NEXT: v_writelane_b32 v4, s31, 1			; GFX7-NEXT: v_writelane_b32 v4, s31, 1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX7-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX7-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; GFX7-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GFX7-NEXT: v_alignbit_b32 v0, v1, v0, 16			; GFX7-NEXT: v_alignbit_b32 v0, v1, v0, 16
	; GFX7-NEXT: v_lshrrev_b32_e32 v1, 16, v2			; GFX7-NEXT: v_lshrrev_b32_e32 v1, 16, v2
	; GFX7-NEXT: v_add_i32_e32 v2, vcc, 4, v3			; GFX7-NEXT: v_add_i32_e32 v2, vcc, 4, v3
	; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v4, 1			; GFX7-NEXT: v_readlane_b32 s31, v4, 1
	; GFX7-NEXT: v_readlane_b32 s30, v4, 0			; GFX7-NEXT: v_readlane_b32 s30, v4, 0
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: v_readlane_b32 s33, v4, 2			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v3bf16:			; GFX8-LABEL: test_call_v3bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: v_writelane_b32 v3, s33, 2			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v3, s30, 0			; GFX8-NEXT: v_writelane_b32 v3, s30, 0
	; GFX8-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX8-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX8-NEXT: v_writelane_b32 v3, s31, 1			; GFX8-NEXT: v_writelane_b32 v3, s31, 1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX8-NEXT: v_add_u32_e32 v4, vcc, 4, v2			; GFX8-NEXT: v_add_u32_e32 v4, vcc, 4, v2
	; GFX8-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v3, 1			; GFX8-NEXT: v_readlane_b32 s31, v3, 1
	; GFX8-NEXT: v_readlane_b32 s30, v3, 0			; GFX8-NEXT: v_readlane_b32 s30, v3, 0
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: v_readlane_b32 s33, v3, 2			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v3bf16:			; GFX9-LABEL: test_call_v3bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v3, s33, 2			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0			; GFX9-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v4			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v4
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v3, s30, 0			; GFX9-NEXT: v_writelane_b32 v3, s30, 0
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_writelane_b32 v3, s31, 1			; GFX9-NEXT: v_writelane_b32 v3, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4			; GFX9-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v3, 2			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v3bf16:			; GFX10-LABEL: test_call_v3bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v3, s33, 2			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX10-NEXT: v_and_b32_e32 v4, 0xffff0000, v0			; GFX10-NEXT: v_and_b32_e32 v4, 0xffff0000, v0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v3, s30, 0			; GFX10-NEXT: v_writelane_b32 v3, s30, 0
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_and_or_b32 v0, 0xffff, v0, v4			; GFX10-NEXT: v_and_or_b32 v0, 0xffff, v0, v4
	; GFX10-NEXT: v_writelane_b32 v3, s31, 1			; GFX10-NEXT: v_writelane_b32 v3, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4			; GFX10-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v3, 1			; GFX10-NEXT: v_readlane_b32 s31, v3, 1
	; GFX10-NEXT: v_readlane_b32 s30, v3, 0			; GFX10-NEXT: v_readlane_b32 s30, v3, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v3, 2			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <3 x bfloat> @test_arg_store_v2bf16(<3 x bfloat> %in)			%result = call <3 x bfloat> @test_arg_store_v2bf16(<3 x bfloat> %in)
	store volatile <3 x bfloat> %result, ptr addrspace(5) %out			store volatile <3 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v4bf16(<4 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v4bf16(<4 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v4bf16:			; GCN-LABEL: test_call_v4bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: v_writelane_b32 v5, s33, 2
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v5, s30, 0			; GCN-NEXT: v_writelane_b32 v5, s30, 0
	; GCN-NEXT: v_writelane_b32 v5, s31, 1			; GCN-NEXT: v_writelane_b32 v5, s31, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	Show All 10 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v1, v8, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v1, v8, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v5, 1			; GCN-NEXT: v_readlane_b32 s31, v5, 1
	; GCN-NEXT: v_readlane_b32 s30, v5, 0			; GCN-NEXT: v_readlane_b32 s30, v5, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v5, 2			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v4bf16:			; GFX7-LABEL: test_call_v4bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: v_writelane_b32 v5, s33, 2			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX7-NEXT: v_writelane_b32 v5, s30, 0			; GFX7-NEXT: v_writelane_b32 v5, s30, 0
	; GFX7-NEXT: v_writelane_b32 v5, s31, 1			; GFX7-NEXT: v_writelane_b32 v5, s31, 1
	Show All 12 Lines
	; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v5, 1			; GFX7-NEXT: v_readlane_b32 s31, v5, 1
	; GFX7-NEXT: v_readlane_b32 s30, v5, 0			; GFX7-NEXT: v_readlane_b32 s30, v5, 0
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: v_readlane_b32 s33, v5, 2			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v4bf16:			; GFX8-LABEL: test_call_v4bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: v_writelane_b32 v3, s33, 2			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v3, s30, 0			; GFX8-NEXT: v_writelane_b32 v3, s30, 0
	; GFX8-NEXT: v_writelane_b32 v3, s31, 1			; GFX8-NEXT: v_writelane_b32 v3, s31, 1
	Show All 10 Lines
	; GFX8-NEXT: buffer_store_short v5, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v5, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v2
	; GFX8-NEXT: buffer_store_short v4, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v4, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v3, 1			; GFX8-NEXT: v_readlane_b32 s31, v3, 1
	; GFX8-NEXT: v_readlane_b32 s30, v3, 0			; GFX8-NEXT: v_readlane_b32 s30, v3, 0
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: v_readlane_b32 s33, v3, 2			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v4bf16:			; GFX9-LABEL: test_call_v4bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v3, s33, 2			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v3, s30, 0			; GFX9-NEXT: v_writelane_b32 v3, s30, 0
	; GFX9-NEXT: v_writelane_b32 v3, s31, 1			; GFX9-NEXT: v_writelane_b32 v3, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_short_d16_hi v1, v2, s[0:3], 0 offen offset:6			; GFX9-NEXT: buffer_store_short_d16_hi v1, v2, s[0:3], 0 offen offset:6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4			; GFX9-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short_d16_hi v0, v2, s[0:3], 0 offen offset:2			; GFX9-NEXT: buffer_store_short_d16_hi v0, v2, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v3, 2			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v4bf16:			; GFX10-LABEL: test_call_v4bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v3, s33, 2			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v3, s30, 0			; GFX10-NEXT: v_writelane_b32 v3, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v3, s31, 1			; GFX10-NEXT: v_writelane_b32 v3, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_short_d16_hi v1, v2, s[0:3], 0 offen offset:6			; GFX10-NEXT: buffer_store_short_d16_hi v1, v2, s[0:3], 0 offen offset:6
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4			; GFX10-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen offset:4
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short_d16_hi v0, v2, s[0:3], 0 offen offset:2			; GFX10-NEXT: buffer_store_short_d16_hi v0, v2, s[0:3], 0 offen offset:2
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v3, 1			; GFX10-NEXT: v_readlane_b32 s31, v3, 1
	; GFX10-NEXT: v_readlane_b32 s30, v3, 0			; GFX10-NEXT: v_readlane_b32 s30, v3, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v3, 2			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <4 x bfloat> @test_arg_store_v2bf16(<4 x bfloat> %in)			%result = call <4 x bfloat> @test_arg_store_v2bf16(<4 x bfloat> %in)
	store volatile <4 x bfloat> %result, ptr addrspace(5) %out			store volatile <4 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v8bf16(<8 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v8bf16(<8 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v8bf16:			; GCN-LABEL: test_call_v8bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: v_writelane_b32 v9, s33, 2
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v9, s30, 0			; GCN-NEXT: v_writelane_b32 v9, s30, 0
	; GCN-NEXT: v_writelane_b32 v9, s31, 1			; GCN-NEXT: v_writelane_b32 v9, s31, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	Show All 26 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v1, v16, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v1, v16, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v9, 1			; GCN-NEXT: v_readlane_b32 s31, v9, 1
	; GCN-NEXT: v_readlane_b32 s30, v9, 0			; GCN-NEXT: v_readlane_b32 s30, v9, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v9, 2			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v8bf16:			; GFX7-LABEL: test_call_v8bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: v_writelane_b32 v9, s33, 2			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX7-NEXT: v_writelane_b32 v9, s30, 0			; GFX7-NEXT: v_writelane_b32 v9, s30, 0
	; GFX7-NEXT: v_writelane_b32 v9, s31, 1			; GFX7-NEXT: v_writelane_b32 v9, s31, 1
	Show All 28 Lines
	; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v9, 1			; GFX7-NEXT: v_readlane_b32 s31, v9, 1
	; GFX7-NEXT: v_readlane_b32 s30, v9, 0			; GFX7-NEXT: v_readlane_b32 s30, v9, 0
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: v_readlane_b32 s33, v9, 2			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v8bf16:			; GFX8-LABEL: test_call_v8bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: v_writelane_b32 v5, s33, 2			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v5, s30, 0			; GFX8-NEXT: v_writelane_b32 v5, s30, 0
	; GFX8-NEXT: v_writelane_b32 v5, s31, 1			; GFX8-NEXT: v_writelane_b32 v5, s31, 1
	Show All 24 Lines
	; GFX8-NEXT: buffer_store_short v7, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v7, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v4			; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v4
	; GFX8-NEXT: buffer_store_short v6, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v6, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v5, 1			; GFX8-NEXT: v_readlane_b32 s31, v5, 1
	; GFX8-NEXT: v_readlane_b32 s30, v5, 0			; GFX8-NEXT: v_readlane_b32 s30, v5, 0
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: v_readlane_b32 s33, v5, 2			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v8bf16:			; GFX9-LABEL: test_call_v8bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v5, s33, 2			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v5, s30, 0			; GFX9-NEXT: v_writelane_b32 v5, s30, 0
	; GFX9-NEXT: v_writelane_b32 v5, s31, 1			; GFX9-NEXT: v_writelane_b32 v5, s31, 1
	Show All 13 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short_d16_hi v0, v4, s[0:3], 0 offen offset:2			; GFX9-NEXT: buffer_store_short_d16_hi v0, v4, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v5, 1			; GFX9-NEXT: v_readlane_b32 s31, v5, 1
	; GFX9-NEXT: v_readlane_b32 s30, v5, 0			; GFX9-NEXT: v_readlane_b32 s30, v5, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v5, 2			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v8bf16:			; GFX10-LABEL: test_call_v8bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v5, s33, 2			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v5, s30, 0			; GFX10-NEXT: v_writelane_b32 v5, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v5, s31, 1			; GFX10-NEXT: v_writelane_b32 v5, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_short_d16_hi v3, v4, s[0:3], 0 offen offset:14			; GFX10-NEXT: buffer_store_short_d16_hi v3, v4, s[0:3], 0 offen offset:14
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v3, v4, s[0:3], 0 offen offset:12			; GFX10-NEXT: buffer_store_short v3, v4, s[0:3], 0 offen offset:12
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short_d16_hi v2, v4, s[0:3], 0 offen offset:10			; GFX10-NEXT: buffer_store_short_d16_hi v2, v4, s[0:3], 0 offen offset:10
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v2, v4, s[0:3], 0 offen offset:8			; GFX10-NEXT: buffer_store_short v2, v4, s[0:3], 0 offen offset:8
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short_d16_hi v1, v4, s[0:3], 0 offen offset:6			; GFX10-NEXT: buffer_store_short_d16_hi v1, v4, s[0:3], 0 offen offset:6
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen offset:4			; GFX10-NEXT: buffer_store_short v1, v4, s[0:3], 0 offen offset:4
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short_d16_hi v0, v4, s[0:3], 0 offen offset:2			; GFX10-NEXT: buffer_store_short_d16_hi v0, v4, s[0:3], 0 offen offset:2
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v5, 1			; GFX10-NEXT: v_readlane_b32 s31, v5, 1
	; GFX10-NEXT: v_readlane_b32 s30, v5, 0			; GFX10-NEXT: v_readlane_b32 s30, v5, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v5, 2			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <8 x bfloat> @test_arg_store_v2bf16(<8 x bfloat> %in)			%result = call <8 x bfloat> @test_arg_store_v2bf16(<8 x bfloat> %in)
	store volatile <8 x bfloat> %result, ptr addrspace(5) %out			store volatile <8 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}

	define void @test_call_v16bf16(<16 x bfloat> %in, ptr addrspace(5) %out) {			define void @test_call_v16bf16(<16 x bfloat> %in, ptr addrspace(5) %out) {
	; GCN-LABEL: test_call_v16bf16:			; GCN-LABEL: test_call_v16bf16:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_mov_b32 s8, s33
	; GCN-NEXT: v_writelane_b32 v17, s33, 2
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_writelane_b32 v17, s30, 0			; GCN-NEXT: v_writelane_b32 v17, s30, 0
	; GCN-NEXT: v_writelane_b32 v17, s31, 1			; GCN-NEXT: v_writelane_b32 v17, s31, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v1, v14, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v1, v14, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v17, 1			; GCN-NEXT: v_readlane_b32 s31, v17, 1
	; GCN-NEXT: v_readlane_b32 s30, v17, 0			; GCN-NEXT: v_readlane_b32 s30, v17, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v17, 2			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v16bf16:			; GFX7-LABEL: test_call_v16bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_store_dword v17, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX7-NEXT: buffer_store_dword v17, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: v_writelane_b32 v17, s33, 2			; GFX7-NEXT: s_mov_b32 s8, s33
	; GFX7-NEXT: s_mov_b32 s33, s32			; GFX7-NEXT: s_mov_b32 s33, s32
	; GFX7-NEXT: s_addk_i32 s32, 0x400			; GFX7-NEXT: s_addk_i32 s32, 0x400
	; GFX7-NEXT: s_getpc_b64 s[4:5]			; GFX7-NEXT: s_getpc_b64 s[4:5]
	; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX7-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX7-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX7-NEXT: v_writelane_b32 v17, s30, 0			; GFX7-NEXT: v_writelane_b32 v17, s30, 0
	; GFX7-NEXT: v_writelane_b32 v17, s31, 1			; GFX7-NEXT: v_writelane_b32 v17, s31, 1
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v1, v2, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v17, 1			; GFX7-NEXT: v_readlane_b32 s31, v17, 1
	; GFX7-NEXT: v_readlane_b32 s30, v17, 0			; GFX7-NEXT: v_readlane_b32 s30, v17, 0
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_addk_i32 s32, 0xfc00
	; GFX7-NEXT: v_readlane_b32 s33, v17, 2			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v17, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v17, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v16bf16:			; GFX8-LABEL: test_call_v16bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX8-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: v_writelane_b32 v9, s33, 2			; GFX8-NEXT: s_mov_b32 s6, s33
	; GFX8-NEXT: s_mov_b32 s33, s32			; GFX8-NEXT: s_mov_b32 s33, s32
	; GFX8-NEXT: s_addk_i32 s32, 0x400			; GFX8-NEXT: s_addk_i32 s32, 0x400
	; GFX8-NEXT: s_getpc_b64 s[4:5]			; GFX8-NEXT: s_getpc_b64 s[4:5]
	; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX8-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX8-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX8-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX8-NEXT: v_writelane_b32 v9, s30, 0			; GFX8-NEXT: v_writelane_b32 v9, s30, 0
	; GFX8-NEXT: v_writelane_b32 v9, s31, 1			; GFX8-NEXT: v_writelane_b32 v9, s31, 1
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: buffer_store_short v11, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v11, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v8			; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v8
	; GFX8-NEXT: buffer_store_short v10, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v10, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v9, 1			; GFX8-NEXT: v_readlane_b32 s31, v9, 1
	; GFX8-NEXT: v_readlane_b32 s30, v9, 0			; GFX8-NEXT: v_readlane_b32 s30, v9, 0
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_addk_i32 s32, 0xfc00
	; GFX8-NEXT: v_readlane_b32 s33, v9, 2			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v16bf16:			; GFX9-LABEL: test_call_v16bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v9, s33, 2			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v9, s30, 0			; GFX9-NEXT: v_writelane_b32 v9, s30, 0
	; GFX9-NEXT: v_writelane_b32 v9, s31, 1			; GFX9-NEXT: v_writelane_b32 v9, s31, 1
	Show All 29 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short_d16_hi v0, v8, s[0:3], 0 offen offset:2			; GFX9-NEXT: buffer_store_short_d16_hi v0, v8, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v9, 1			; GFX9-NEXT: v_readlane_b32 s31, v9, 1
	; GFX9-NEXT: v_readlane_b32 s30, v9, 0			; GFX9-NEXT: v_readlane_b32 s30, v9, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v9, 2			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v16bf16:			; GFX10-LABEL: test_call_v16bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v9, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v9, s33, 2			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, test_arg_store_v2bf16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, test_arg_store_v2bf16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v9, s30, 0			; GFX10-NEXT: v_writelane_b32 v9, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v9, s31, 1			; GFX10-NEXT: v_writelane_b32 v9, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: buffer_store_short_d16_hi v7, v8, s[0:3], 0 offen offset:30			; GFX10-NEXT: buffer_store_short_d16_hi v7, v8, s[0:3], 0 offen offset:30
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v7, v8, s[0:3], 0 offen offset:28			; GFX10-NEXT: buffer_store_short v7, v8, s[0:3], 0 offen offset:28
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short_d16_hi v6, v8, s[0:3], 0 offen offset:26			; GFX10-NEXT: buffer_store_short_d16_hi v6, v8, s[0:3], 0 offen offset:26
	Show All 22 Lines
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short_d16_hi v0, v8, s[0:3], 0 offen offset:2			; GFX10-NEXT: buffer_store_short_d16_hi v0, v8, s[0:3], 0 offen offset:2
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v9, 1			; GFX10-NEXT: v_readlane_b32 s31, v9, 1
	; GFX10-NEXT: v_readlane_b32 s30, v9, 0			; GFX10-NEXT: v_readlane_b32 s30, v9, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v9, 2			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <16 x bfloat> @test_arg_store_v2bf16(<16 x bfloat> %in)			%result = call <16 x bfloat> @test_arg_store_v2bf16(<16 x bfloat> %in)
	▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=5 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN-V5 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=5 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN-V5 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s

	; Make sure to run a GPU with the SGPR allocation bug.			; Make sure to run a GPU with the SGPR allocation bug.

	; GCN-LABEL: {{^}}use_vcc:			; GCN-LABEL: {{^}}use_vcc:
	; GCN: ; NumSgprs: 34			; GCN: ; NumSgprs: 34
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_vcc() #1 {			define void @use_vcc() #1 {
	call void asm sideeffect "", "~{vcc}" () #0			call void asm sideeffect "", "~{vcc}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_vcc:			; GCN-LABEL: {{^}}indirect_use_vcc:
	; GCN: v_writelane_b32 v40, s33, 2			; GCN: v_writelane_b32 v41, s33, 0
	; GCN: v_writelane_b32 v40, s30, 0			; GCN: v_writelane_b32 v40, s30, 0
	; GCN: v_writelane_b32 v40, s31, 1			; GCN: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: v_readlane_b32 s31, v40, 1			; GCN: v_readlane_b32 s31, v40, 1
	; GCN: v_readlane_b32 s30, v40, 0			; GCN: v_readlane_b32 s30, v40, 0
	; GCN: v_readlane_b32 s33, v40, 2			; GCN: v_readlane_b32 s33, v41, 0
	; GCN: s_setpc_b64 s[30:31]			; GCN: s_setpc_b64 s[30:31]
	; GCN: ; NumSgprs: 36			; GCN: ; NumSgprs: 36
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define void @indirect_use_vcc() #1 {			define void @indirect_use_vcc() #1 {
	call void @use_vcc()			call void @use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define amdgpu_kernel void @indirect_2level_use_vcc_kernel(ptr addrspace(1) %out) #0 {			define amdgpu_kernel void @indirect_2level_use_vcc_kernel(ptr addrspace(1) %out) #0 {
	call void @indirect_use_vcc()			call void @indirect_use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_flat_scratch:			; GCN-LABEL: {{^}}use_flat_scratch:
	; CI: ; NumSgprs: 36			; CI: ; NumSgprs: 36
	; VI: ; NumSgprs: 38			; VI: ; NumSgprs: 38
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_flat_scratch() #1 {			define void @use_flat_scratch() #1 {
	call void asm sideeffect "", "~{flat_scratch}" () #0			call void asm sideeffect "", "~{flat_scratch}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_flat_scratch:			; GCN-LABEL: {{^}}indirect_use_flat_scratch:
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI: ; NumSgprs: 40			; VI: ; NumSgprs: 40
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define void @indirect_use_flat_scratch() #1 {			define void @indirect_use_flat_scratch() #1 {
	call void @use_flat_scratch()			call void @use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(ptr addrspace(1) %out) #0 {			define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(ptr addrspace(1) %out) #0 {
	call void @indirect_use_flat_scratch()			call void @indirect_use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_10_vgpr:			; GCN-LABEL: {{^}}use_10_vgpr:
	; GCN: ; NumVgprs: 10			; GCN: ; NumVgprs: 10
	define void @use_10_vgpr() #1 {			define void @use_10_vgpr() #1 {
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0
	call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0			call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_use_10_vgpr:
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define void @indirect_use_10_vgpr() #0 {			define void @indirect_use_10_vgpr() #0 {
	call void @use_10_vgpr()			call void @use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {			define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {
	call void @indirect_use_10_vgpr()			call void @indirect_use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_50_vgpr:			; GCN-LABEL: {{^}}use_50_vgpr:
	; GCN: ; NumVgprs: 50			; GCN: ; NumVgprs: 50
	define void @use_50_vgpr() #1 {			define void @use_50_vgpr() #1 {
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

Show All 17 Lines	define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; MUBUF: buffer_store_dword		; MUBUF: buffer_store_dword
		; MUBUF: buffer_store_dword
		; FLATSCR: scratch_store_dword
; FLATSCR: scratch_store_dword		; FLATSCR: scratch_store_dword
; GCN: v_writelane_b32 v40, s33, 4
; GCN: v_writelane_b32 v40, s30, 0		; GCN: v_writelane_b32 v40, s30, 0
; GCN: v_writelane_b32 v40, s31, 1		; GCN: v_writelane_b32 v40, s31, 1
		; GCN: v_writelane_b32 v41, s33, 0
; GCN: v_writelane_b32 v40, s34, 2		; GCN: v_writelane_b32 v40, s34, 2
; GCN: v_writelane_b32 v40, s35, 3		; GCN: v_writelane_b32 v40, s35, 3

; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN: v_readlane_b32 s35, v40, 3		; GCN: v_readlane_b32 s35, v40, 3
; GCN: v_readlane_b32 s34, v40, 2		; GCN: v_readlane_b32 s34, v40, 2
; MUBUF-DAG: v_readlane_b32 s31, v40, 1		; MUBUF-DAG: v_readlane_b32 s31, v40, 1
; MUBUF-DAG: v_readlane_b32 s30, v40, 0		; MUBUF-DAG: v_readlane_b32 s30, v40, 0
; FLATSCR-DAG: v_readlane_b32 s31, v40, 1		; FLATSCR-DAG: v_readlane_b32 s31, v40, 1
; FLATSCR-DAG: v_readlane_b32 s30, v40, 0		; FLATSCR-DAG: v_readlane_b32 s30, v40, 0

; GCN: v_readlane_b32 s33, v40, 4		; GCN: v_readlane_b32 s33, v41, 0
; MUBUF: buffer_load_dword		; MUBUF: buffer_load_dword
		; MUBUF: buffer_load_dword
		; FLATSCR: scratch_load_dword
; FLATSCR: scratch_load_dword		; FLATSCR: scratch_load_dword
; GCN: s_setpc_b64 s[30:31]		; GCN: s_setpc_b64 s[30:31]
define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:		; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:
; MUBUF: buffer_store_dword v40		; MUBUF: buffer_store_dword v40
		; MUBUF: buffer_store_dword v41
; FLATSCR: scratch_store_dword off, v40		; FLATSCR: scratch_store_dword off, v40
; GCN: v_writelane_b32 v40, s33, 4		; FLATSCR: scratch_store_dword off, v41
		; GCN: v_writelane_b32 v41, s33, 0

; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; MUBUF: s_addk_i32 s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; FLATSCR: s_add_i32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64

; GCN: v_readlane_b32 s33, v40, 4		; GCN: v_readlane_b32 s33, v41, 0
; MUBUF: buffer_load_dword v40		; MUBUF: buffer_load_dword v40
		; MUBUF: buffer_load_dword v41
; FLATSCR: scratch_load_dword v40		; FLATSCR: scratch_load_dword v40
		; FLATSCR: scratch_load_dword v41
define void @test_func_call_external_void_funcx2() #0 {		define void @test_func_call_external_void_funcx2() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:		; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:
; GCN: s_waitcnt		; GCN: s_waitcnt
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	define void @callee_with_stack_no_fp_elim_non_leaf() #2 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_with_stack_and_call:		; GCN-LABEL: {{^}}callee_with_stack_and_call:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR_1:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR_1:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN: v_writelane_b32 [[CSR_VGPR_1]], s33, 0
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; MUBUF-DAG: s_addk_i32 s32, 0x400{{$}}		; MUBUF-DAG: s_addk_i32 s32, 0x400{{$}}
; FLATSCR-DAG: s_add_i32 s32, s32, 16{{$}}		; FLATSCR-DAG: s_add_i32 s32, s32, 16{{$}}
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,

; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}		; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]
; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]

; MUBUF: s_addk_i32 s32, 0xfc00{{$}}		; MUBUF: s_addk_i32 s32, 0xfc00{{$}}
; FLATSCR: s_add_i32 s32, s32, -16{{$}}		; FLATSCR: s_add_i32 s32, s32, -16{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR_1]], 0
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR_1]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR_1]], off, s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)

; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @callee_with_stack_and_call() #0 {		define void @callee_with_stack_and_call() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, ptr addrspace(5) %alloca		store volatile i32 0, ptr addrspace(5) %alloca
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; Should be able to copy incoming stack pointer directly to inner		; Should be able to copy incoming stack pointer directly to inner
; call's stack pointer argument.		; call's stack pointer argument.

; There is stack usage only because of the need to evict a VGPR for		; There is stack usage only because of the need to evict a VGPR for
; spilling CSR SGPRs.		; spilling CSR SGPRs.

; GCN-LABEL: {{^}}callee_no_stack_with_call:		; GCN-LABEL: {{^}}callee_no_stack_with_call:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill
		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR_1:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR_1:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; MUBUF-DAG: s_addk_i32 s32, 0x400		; MUBUF-DAG: s_addk_i32 s32, 0x400
; FLATSCR-DAG: s_add_i32 s32, s32, 16		; FLATSCR-DAG: s_add_i32 s32, s32, 16
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]		; GCN-DAG: v_writelane_b32 [[CSR_VGPR_1]], s33, [[FP_SPILL_LANE:[0-9]+]]

; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0		; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0
; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1		; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1

; MUBUF: s_addk_i32 s32, 0xfc00		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_i32 s32, s32, -16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR_1]], [[FP_SPILL_LANE]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR_1]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR_1]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @callee_no_stack_with_call() #0 {		define void @callee_no_stack_with_call() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines

; Use a copy to a free SGPR instead of introducing a second CSR VGPR.		; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:		; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 v0, s33, 63
; GCN-COUNT-60: v_writelane_b32 v0		; GCN-COUNT-60: v_writelane_b32 v0
		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0
; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:4		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:4
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:4		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:4
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0

; MUBUF: s_addk_i32 s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; MUBUF: s_addk_i32 s32, 0xfc00		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_i32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; FLATSCR: s_add_i32 s32, s32, -16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN-NEXT: v_readlane_b32 s33, v0, 63		; GCN-NEXT: s_mov_b32 s33, [[TMP_SGPR]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @last_lane_vgpr_for_fp_csr() #1 {		define void @last_lane_vgpr_for_fp_csr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
}		}

; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; MUBUF: s_addk_i32 s32, 0x300		; MUBUF: s_addk_i32 s32, 0x300
; FLATSCR: s_add_i32 s32, s32, 12		; FLATSCR: s_add_i32 s32, s32, 12
; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1
; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
; FLATSCR: scratch_store_dword off, [[ZERO]], s33{{$}}		; FLATSCR: scratch_store_dword off, [[ZERO]], s33{{$}}
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1		; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1
; GCN: v_readlane_b32 s30, [[CSR_VGPR]], 0		; GCN: v_readlane_b32 s30, [[CSR_VGPR]], 0
; MUBUF: s_addk_i32 s32, 0xfd00		; MUBUF: s_addk_i32 s32, 0xfd00
; FLATSCR: s_add_i32 s32, s32, -12		; FLATSCR: s_add_i32 s32, s32, -12
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: s_mov_b32 s33, vcc_lo
		arsenmUnsubmitted Not Done Reply Inline Actions Why the behavior change? Is this restored in a later patch? arsenm: Why the behavior change? Is this restored in a later patch?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions It's already been discussed. Jay earlier asked about the same in this review. I'm planning a follow-up patch to regain it. Using the VRM map, the unused lanes of the last allocated VGPR virtual register for SGPR spilling can be tracked and can use later during FrameLowering while trying to spill FP/BP. cdevadas: It's already been discussed. Jay earlier asked about the same in this review. I'm planning a…
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @no_unused_non_csr_sgpr_for_fp() #1 {		define void @no_unused_non_csr_sgpr_for_fp() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
Show All 11 Lines

; Need a new CSR VGPR to satisfy the FP spill.		; Need a new CSR VGPR to satisfy the FP spill.
; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; MUBUF: s_addk_i32 s32, 0x300{{$}}		; MUBUF: s_addk_i32 s32, 0x300{{$}}
; FLATSCR: s_add_i32 s32, s32, 12{{$}}		; FLATSCR: s_add_i32 s32, s32, 12{{$}}

; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: s_addk_i32 s32, 0xfd00{{$}}		; MUBUF: s_addk_i32 s32, 0xfd00{{$}}
; FLATSCR: s_add_i32 s32, s32, -12{{$}}		; FLATSCR: s_add_i32 s32, s32, -12{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: s_mov_b32 s33, vcc_lo
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {		define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
Show All 20 Lines
; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:		; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}		; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}
; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}		; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}
; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}		; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}
; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}		; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: s_mov_b32 s33, vcc_lo
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
Show All 23 Lines
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define internal void @local_empty_func() #0 {		define internal void @local_empty_func() #0 {
ret void		ret void
}		}

; An FP is needed, despite not needing any spills		; An FP is needed, despite not needing any spills
; TODO: Ccould see callee does not use stack and omit FP.		; TODO: Ccould see callee does not use stack and omit FP.
; GCN-LABEL: {{^}}ipra_call_with_stack:		; GCN-LABEL: {{^}}ipra_call_with_stack:
; GCN: v_writelane_b32 v0, s33, 2		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; MUBUF: s_addk_i32 s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; FLATSCR: s_add_i32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; MUBUF: s_addk_i32 s32, 0xfc00		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_i32 s32, s32, -16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN: v_readlane_b32 s33, v0, 2		; GCN: s_mov_b32 s33, [[TMP_SGPR]]
define void @ipra_call_with_stack() #0 {		define void @ipra_call_with_stack() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, ptr addrspace(5) %alloca		store volatile i32 0, ptr addrspace(5) %alloca
call void @local_empty_func()		call void @local_empty_func()
ret void		ret void
}		}

; With no free registers, we must spill the FP to memory.		; With no free registers, we must spill the FP to memory.
▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 23 Lines


	define float @call_split_type_used_outside_block_v2f32() #0 {			define float @call_split_type_used_outside_block_v2f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v2f32:			; GCN-LABEL: call_split_type_used_outside_block_v2f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <2 x float> @func_v2f32()			%split.ret.type = call <2 x float> @func_v2f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <2 x float> %split.ret.type, i32 0			%extract = extractelement <2 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define float @call_split_type_used_outside_block_v3f32() #0 {			define float @call_split_type_used_outside_block_v3f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v3f32:			; GCN-LABEL: call_split_type_used_outside_block_v3f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <3 x float> @func_v3f32()			%split.ret.type = call <3 x float> @func_v3f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <3 x float> %split.ret.type, i32 0			%extract = extractelement <3 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define half @call_split_type_used_outside_block_v4f16() #0 {			define half @call_split_type_used_outside_block_v4f16() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v4f16:			; GCN-LABEL: call_split_type_used_outside_block_v4f16:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <4 x half> @func_v4f16()			%split.ret.type = call <4 x half> @func_v4f16()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <4 x half> %split.ret.type, i32 0			%extract = extractelement <4 x half> %split.ret.type, i32 0
	ret half %extract			ret half %extract
	}			}

	define { i32, half } @call_split_type_used_outside_block_struct() #0 {			define { i32, half } @call_split_type_used_outside_block_struct() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_struct:			; GCN-LABEL: call_split_type_used_outside_block_struct:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_mov_b32_e32 v1, v4			; GCN-NEXT: v_mov_b32_e32 v1, v4
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()			%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()
	br label %bb1			br label %bb1

	bb1:			bb1:
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

	Show All 12 Lines
	; CHECK: .Lfunc_begin0:			; CHECK: .Lfunc_begin0:
	; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0			; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v40, s33, 16
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: v_writelane_b32 v40, s34, 2			; CHECK-NEXT: v_writelane_b32 v40, s34, 2
	; CHECK-NEXT: v_writelane_b32 v40, s35, 3			; CHECK-NEXT: v_writelane_b32 v40, s35, 3
	; CHECK-NEXT: v_writelane_b32 v40, s36, 4			; CHECK-NEXT: v_writelane_b32 v40, s36, 4
	; CHECK-NEXT: v_writelane_b32 v40, s37, 5			; CHECK-NEXT: v_writelane_b32 v40, s37, 5
	; CHECK-NEXT: v_writelane_b32 v40, s38, 6			; CHECK-NEXT: v_writelane_b32 v40, s38, 6
	; CHECK-NEXT: v_writelane_b32 v40, s39, 7			; CHECK-NEXT: v_writelane_b32 v40, s39, 7
	; CHECK-NEXT: v_writelane_b32 v40, s40, 8			; CHECK-NEXT: v_writelane_b32 v40, s40, 8
	; CHECK-NEXT: v_writelane_b32 v40, s41, 9			; CHECK-NEXT: v_writelane_b32 v40, s41, 9
	; CHECK-NEXT: v_writelane_b32 v40, s42, 10			; CHECK-NEXT: v_writelane_b32 v40, s42, 10
	; CHECK-NEXT: v_writelane_b32 v40, s43, 11			; CHECK-NEXT: v_writelane_b32 v40, s43, 11
	; CHECK-NEXT: v_writelane_b32 v40, s44, 12			; CHECK-NEXT: v_writelane_b32 v40, s44, 12
				; CHECK-NEXT: v_writelane_b32 v42, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s45, 13			; CHECK-NEXT: v_writelane_b32 v40, s45, 13
	; CHECK-NEXT: v_writelane_b32 v40, s46, 14			; CHECK-NEXT: v_writelane_b32 v40, s46, 14
	; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]			; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]
	; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef			; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9			; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: v_readlane_b32 s38, v40, 6			; CHECK-NEXT: v_readlane_b32 s38, v40, 6
	; CHECK-NEXT: v_readlane_b32 s37, v40, 5			; CHECK-NEXT: v_readlane_b32 s37, v40, 5
	; CHECK-NEXT: v_readlane_b32 s36, v40, 4			; CHECK-NEXT: v_readlane_b32 s36, v40, 4
	; CHECK-NEXT: v_readlane_b32 s35, v40, 3			; CHECK-NEXT: v_readlane_b32 s35, v40, 3
	; CHECK-NEXT: v_readlane_b32 s34, v40, 2			; CHECK-NEXT: v_readlane_b32 s34, v40, 2
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: v_readlane_b32 s33, v40, 16			; CHECK-NEXT: v_readlane_b32 s33, v42, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	%2 = call ptr @__kmpc_alloc_shared(), !dbg !43			%2 = call ptr @__kmpc_alloc_shared(), !dbg !43
	%3 = call ptr @__kmpc_alloc_shared()			%3 = call ptr @__kmpc_alloc_shared()
	store i32 0, ptr %3, align 4			store i32 0, ptr %3, align 4
	call void @llvm.dbg.declare(metadata ptr %3, metadata !40, metadata !DIExpression()), !dbg !43			call void @llvm.dbg.declare(metadata ptr %3, metadata !40, metadata !DIExpression()), !dbg !43
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s

	; Check frame setup where SGPR spills to VGPRs are disabled or enabled.			; Check frame setup where SGPR spills to VGPRs are disabled or enabled.

	declare hidden void @external_void_func_void() #0			declare hidden void @external_void_func_void() #0

	define void @callee_with_stack_and_call() #0 {			define void @callee_with_stack_and_call() #0 {
	; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; SPILL-TO-VGPR: ; %bb.0:			; SPILL-TO-VGPR: ; %bb.0:
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; SPILL-TO-VGPR-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s33, 2			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v41, s33, 0
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0
	; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0			; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33			; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]			; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]
	; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]			; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s33, v40, 2			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s33, v41, 0
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1
	; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; SPILL-TO-VGPR-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]			; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]
	;			;
	; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; NO-SPILL-TO-VGPR: ; %bb.0:			; NO-SPILL-TO-VGPR: ; %bb.0:
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, s33			; NO-SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, s33
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s
	; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s			; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s

	declare void @extern_c_func()			declare void @extern_c_func()

	define amdgpu_gfx void @gfx_func() {			define amdgpu_gfx void @gfx_func() {
	; SDAG-LABEL: gfx_func:			; SDAG-LABEL: gfx_func:
	; SDAG: ; %bb.0:			; SDAG: ; %bb.0:
	; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: v_writelane_b32 v40, s33, 28
	; SDAG-NEXT: v_writelane_b32 v40, s4, 0			; SDAG-NEXT: v_writelane_b32 v40, s4, 0
	; SDAG-NEXT: v_writelane_b32 v40, s5, 1			; SDAG-NEXT: v_writelane_b32 v40, s5, 1
	; SDAG-NEXT: v_writelane_b32 v40, s6, 2			; SDAG-NEXT: v_writelane_b32 v40, s6, 2
	; SDAG-NEXT: v_writelane_b32 v40, s7, 3			; SDAG-NEXT: v_writelane_b32 v40, s7, 3
	; SDAG-NEXT: v_writelane_b32 v40, s8, 4			; SDAG-NEXT: v_writelane_b32 v40, s8, 4
	; SDAG-NEXT: v_writelane_b32 v40, s9, 5			; SDAG-NEXT: v_writelane_b32 v40, s9, 5
	; SDAG-NEXT: v_writelane_b32 v40, s10, 6			; SDAG-NEXT: v_writelane_b32 v40, s10, 6
	; SDAG-NEXT: v_writelane_b32 v40, s11, 7			; SDAG-NEXT: v_writelane_b32 v40, s11, 7
	; SDAG-NEXT: v_writelane_b32 v40, s12, 8			; SDAG-NEXT: v_writelane_b32 v40, s12, 8
	; SDAG-NEXT: v_writelane_b32 v40, s13, 9			; SDAG-NEXT: v_writelane_b32 v40, s13, 9
	; SDAG-NEXT: v_writelane_b32 v40, s14, 10			; SDAG-NEXT: v_writelane_b32 v40, s14, 10
	; SDAG-NEXT: v_writelane_b32 v40, s15, 11			; SDAG-NEXT: v_writelane_b32 v40, s15, 11
	; SDAG-NEXT: v_writelane_b32 v40, s16, 12			; SDAG-NEXT: v_writelane_b32 v40, s16, 12
	; SDAG-NEXT: v_writelane_b32 v40, s17, 13			; SDAG-NEXT: v_writelane_b32 v40, s17, 13
	; SDAG-NEXT: v_writelane_b32 v40, s18, 14			; SDAG-NEXT: v_writelane_b32 v40, s18, 14
	; SDAG-NEXT: v_writelane_b32 v40, s19, 15			; SDAG-NEXT: v_writelane_b32 v40, s19, 15
	; SDAG-NEXT: v_writelane_b32 v40, s20, 16			; SDAG-NEXT: v_writelane_b32 v40, s20, 16
	; SDAG-NEXT: v_writelane_b32 v40, s21, 17			; SDAG-NEXT: v_writelane_b32 v40, s21, 17
	; SDAG-NEXT: v_writelane_b32 v40, s22, 18			; SDAG-NEXT: v_writelane_b32 v40, s22, 18
	; SDAG-NEXT: v_writelane_b32 v40, s23, 19			; SDAG-NEXT: v_writelane_b32 v40, s23, 19
				; SDAG-NEXT: s_mov_b32 s36, s33
	; SDAG-NEXT: s_mov_b32 s33, s32			; SDAG-NEXT: s_mov_b32 s33, s32
	; SDAG-NEXT: s_addk_i32 s32, 0x400			; SDAG-NEXT: s_addk_i32 s32, 0x400
	; SDAG-NEXT: v_writelane_b32 v40, s24, 20			; SDAG-NEXT: v_writelane_b32 v40, s24, 20
	; SDAG-NEXT: v_writelane_b32 v40, s25, 21			; SDAG-NEXT: v_writelane_b32 v40, s25, 21
	; SDAG-NEXT: s_getpc_b64 s[34:35]			; SDAG-NEXT: s_getpc_b64 s[34:35]
	; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; SDAG-NEXT: v_writelane_b32 v40, s26, 22			; SDAG-NEXT: v_writelane_b32 v40, s26, 22
	Show All 30 Lines
	; SDAG-NEXT: v_readlane_b32 s10, v40, 6			; SDAG-NEXT: v_readlane_b32 s10, v40, 6
	; SDAG-NEXT: v_readlane_b32 s9, v40, 5			; SDAG-NEXT: v_readlane_b32 s9, v40, 5
	; SDAG-NEXT: v_readlane_b32 s8, v40, 4			; SDAG-NEXT: v_readlane_b32 s8, v40, 4
	; SDAG-NEXT: v_readlane_b32 s7, v40, 3			; SDAG-NEXT: v_readlane_b32 s7, v40, 3
	; SDAG-NEXT: v_readlane_b32 s6, v40, 2			; SDAG-NEXT: v_readlane_b32 s6, v40, 2
	; SDAG-NEXT: v_readlane_b32 s5, v40, 1			; SDAG-NEXT: v_readlane_b32 s5, v40, 1
	; SDAG-NEXT: v_readlane_b32 s4, v40, 0			; SDAG-NEXT: v_readlane_b32 s4, v40, 0
	; SDAG-NEXT: s_addk_i32 s32, 0xfc00			; SDAG-NEXT: s_addk_i32 s32, 0xfc00
	; SDAG-NEXT: v_readlane_b32 s33, v40, 28			; SDAG-NEXT: s_mov_b32 s33, s36
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: s_waitcnt vmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0)
	; SDAG-NEXT: s_setpc_b64 s[30:31]			; SDAG-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: gfx_func:			; GISEL-LABEL: gfx_func:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 28
	; GISEL-NEXT: v_writelane_b32 v40, s4, 0			; GISEL-NEXT: v_writelane_b32 v40, s4, 0
	; GISEL-NEXT: v_writelane_b32 v40, s5, 1			; GISEL-NEXT: v_writelane_b32 v40, s5, 1
	; GISEL-NEXT: v_writelane_b32 v40, s6, 2			; GISEL-NEXT: v_writelane_b32 v40, s6, 2
	; GISEL-NEXT: v_writelane_b32 v40, s7, 3			; GISEL-NEXT: v_writelane_b32 v40, s7, 3
	; GISEL-NEXT: v_writelane_b32 v40, s8, 4			; GISEL-NEXT: v_writelane_b32 v40, s8, 4
	; GISEL-NEXT: v_writelane_b32 v40, s9, 5			; GISEL-NEXT: v_writelane_b32 v40, s9, 5
	; GISEL-NEXT: v_writelane_b32 v40, s10, 6			; GISEL-NEXT: v_writelane_b32 v40, s10, 6
	; GISEL-NEXT: v_writelane_b32 v40, s11, 7			; GISEL-NEXT: v_writelane_b32 v40, s11, 7
	; GISEL-NEXT: v_writelane_b32 v40, s12, 8			; GISEL-NEXT: v_writelane_b32 v40, s12, 8
	; GISEL-NEXT: v_writelane_b32 v40, s13, 9			; GISEL-NEXT: v_writelane_b32 v40, s13, 9
	; GISEL-NEXT: v_writelane_b32 v40, s14, 10			; GISEL-NEXT: v_writelane_b32 v40, s14, 10
	; GISEL-NEXT: v_writelane_b32 v40, s15, 11			; GISEL-NEXT: v_writelane_b32 v40, s15, 11
	; GISEL-NEXT: v_writelane_b32 v40, s16, 12			; GISEL-NEXT: v_writelane_b32 v40, s16, 12
	; GISEL-NEXT: v_writelane_b32 v40, s17, 13			; GISEL-NEXT: v_writelane_b32 v40, s17, 13
	; GISEL-NEXT: v_writelane_b32 v40, s18, 14			; GISEL-NEXT: v_writelane_b32 v40, s18, 14
	; GISEL-NEXT: v_writelane_b32 v40, s19, 15			; GISEL-NEXT: v_writelane_b32 v40, s19, 15
	; GISEL-NEXT: v_writelane_b32 v40, s20, 16			; GISEL-NEXT: v_writelane_b32 v40, s20, 16
	; GISEL-NEXT: v_writelane_b32 v40, s21, 17			; GISEL-NEXT: v_writelane_b32 v40, s21, 17
	; GISEL-NEXT: v_writelane_b32 v40, s22, 18			; GISEL-NEXT: v_writelane_b32 v40, s22, 18
	; GISEL-NEXT: v_writelane_b32 v40, s23, 19			; GISEL-NEXT: v_writelane_b32 v40, s23, 19
				; GISEL-NEXT: s_mov_b32 s36, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s24, 20			; GISEL-NEXT: v_writelane_b32 v40, s24, 20
	; GISEL-NEXT: v_writelane_b32 v40, s25, 21			; GISEL-NEXT: v_writelane_b32 v40, s25, 21
	; GISEL-NEXT: s_getpc_b64 s[34:35]			; GISEL-NEXT: s_getpc_b64 s[34:35]
	; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; GISEL-NEXT: v_writelane_b32 v40, s26, 22			; GISEL-NEXT: v_writelane_b32 v40, s26, 22
	Show All 30 Lines
	; GISEL-NEXT: v_readlane_b32 s10, v40, 6			; GISEL-NEXT: v_readlane_b32 s10, v40, 6
	; GISEL-NEXT: v_readlane_b32 s9, v40, 5			; GISEL-NEXT: v_readlane_b32 s9, v40, 5
	; GISEL-NEXT: v_readlane_b32 s8, v40, 4			; GISEL-NEXT: v_readlane_b32 s8, v40, 4
	; GISEL-NEXT: v_readlane_b32 s7, v40, 3			; GISEL-NEXT: v_readlane_b32 s7, v40, 3
	; GISEL-NEXT: v_readlane_b32 s6, v40, 2			; GISEL-NEXT: v_readlane_b32 s6, v40, 2
	; GISEL-NEXT: v_readlane_b32 s5, v40, 1			; GISEL-NEXT: v_readlane_b32 s5, v40, 1
	; GISEL-NEXT: v_readlane_b32 s4, v40, 0			; GISEL-NEXT: v_readlane_b32 s4, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 28			; GISEL-NEXT: s_mov_b32 s33, s36
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void @extern_c_func()			call void @extern_c_func()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	declare hidden amdgpu_gfx void @external_void_func_v16i8(<16 x i8>) #0			declare hidden amdgpu_gfx void @external_void_func_v16i8(<16 x i8>) #0

	define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_imm:			; GFX9-LABEL: test_call_external_void_func_i1_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm:			; GFX10-LABEL: test_call_external_void_func_i1_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_imm:			; GFX11-LABEL: test_call_external_void_func_i1_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 1			; GFX11-NEXT: v_mov_b32_e32 v0, 1
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1(i1 true)			call amdgpu_gfx void @external_void_func_i1(i1 true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_signext:			; GFX9-LABEL: test_call_external_void_func_i1_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_signext:			; GFX10-LABEL: test_call_external_void_func_i1_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_signext:			; GFX11-LABEL: test_call_external_void_func_i1_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: v_and_b32_e32 v0, 1, v0			; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, ptr addrspace(1) undef			%var = load volatile i1, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)			call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_zeroext:			; GFX9-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_zeroext:			; GFX11-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: v_and_b32_e32 v0, 1, v0			; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, ptr addrspace(1) undef			%var = load volatile i1, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)			call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm:			; GFX9-LABEL: test_call_external_void_func_i8_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm:			; GFX10-LABEL: test_call_external_void_func_i8_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_imm:			; GFX11-LABEL: test_call_external_void_func_i8_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8(i8 123)			call amdgpu_gfx void @external_void_func_i8(i8 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_signext:			; GFX9-LABEL: test_call_external_void_func_i8_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_signext:			; GFX10-LABEL: test_call_external_void_func_i8_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_signext:			; GFX11-LABEL: test_call_external_void_func_i8_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_i8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_i8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, ptr addrspace(1) undef			%var = load volatile i8, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)			call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_zeroext:			; GFX9-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_zeroext:			; GFX11-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, ptr addrspace(1) undef			%var = load volatile i8, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)			call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm:			; GFX9-LABEL: test_call_external_void_func_i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm:			; GFX10-LABEL: test_call_external_void_func_i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_imm:			; GFX11-LABEL: test_call_external_void_func_i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16(i16 123)			call amdgpu_gfx void @external_void_func_i16(i16 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_signext:			; GFX9-LABEL: test_call_external_void_func_i16_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_signext:			; GFX10-LABEL: test_call_external_void_func_i16_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_signext:			; GFX11-LABEL: test_call_external_void_func_i16_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, ptr addrspace(1) undef			%var = load volatile i16, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)			call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_zeroext:			; GFX9-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_zeroext:			; GFX11-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc			; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, ptr addrspace(1) undef			%var = load volatile i16, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)			call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm:			; GFX9-LABEL: test_call_external_void_func_i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm:			; GFX10-LABEL: test_call_external_void_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i32_imm:			; GFX11-LABEL: test_call_external_void_func_i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 42			; GFX11-NEXT: v_mov_b32_e32 v0, 42
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32(i32 42)			call amdgpu_gfx void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm:			; GFX9-LABEL: test_call_external_void_func_i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm:			; GFX10-LABEL: test_call_external_void_func_i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i64_imm:			; GFX11-LABEL: test_call_external_void_func_i64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0x7b :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0x7b :: v_dual_mov_b32 v1, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64(i64 123)			call amdgpu_gfx void @external_void_func_i64(i64 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64:			; GFX9-LABEL: test_call_external_void_func_v2i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64:			; GFX10-LABEL: test_call_external_void_func_v2i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64:			; GFX11-LABEL: test_call_external_void_func_v2i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, ptr addrspace(1) null			%val = load <2 x i64>, ptr addrspace(1) null
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_imm:			; GFX11-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64:			; GFX9-LABEL: test_call_external_void_func_v3i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64:			; GFX10-LABEL: test_call_external_void_func_v3i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i64:			; GFX11-LABEL: test_call_external_void_func_v3i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off			; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
				; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(1) null			%load = load <2 x i64>, ptr addrspace(1) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)			call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i64:			; GFX9-LABEL: test_call_external_void_func_v4i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: v_mov_b32_e32 v6, 3			; GFX9-NEXT: v_mov_b32_e32 v6, 3
	; GFX9-NEXT: v_mov_b32_e32 v7, 4			; GFX9-NEXT: v_mov_b32_e32 v7, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64:			; GFX10-LABEL: test_call_external_void_func_v4i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: v_mov_b32_e32 v6, 3			; GFX10-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v7, 4			; GFX10-NEXT: v_mov_b32_e32 v7, 4
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i64:			; GFX11-LABEL: test_call_external_void_func_v4i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2
	; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1			; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v6, 3 :: v_dual_mov_b32 v7, 4			; GFX11-NEXT: v_dual_mov_b32 v6, 3 :: v_dual_mov_b32 v7, 4
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off			; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(1) null			%load = load <2 x i64>, ptr addrspace(1) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)			call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm:			; GFX9-LABEL: test_call_external_void_func_f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm:			; GFX10-LABEL: test_call_external_void_func_f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f16_imm:			; GFX11-LABEL: test_call_external_void_func_f16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX11-NEXT: v_mov_b32_e32 v0, 0x4400
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16(half 4.0)			call amdgpu_gfx void @external_void_func_f16(half 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm:			; GFX9-LABEL: test_call_external_void_func_f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 4.0			; GFX9-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm:			; GFX10-LABEL: test_call_external_void_func_f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-NEXT: v_mov_b32_e32 v0, 4.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f32_imm:			; GFX11-LABEL: test_call_external_void_func_f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 4.0			; GFX11-NEXT: v_mov_b32_e32 v0, 4.0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32(float 4.0)			call amdgpu_gfx void @external_void_func_f32(float 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f32_imm:			; GFX11-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f32_imm:			; GFX11-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_mov_b32_e32 v2, 4.0			; GFX11-NEXT: v_mov_b32_e32 v2, 4.0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: v_mov_b32_e32 v3, -1.0			; GFX9-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX9-NEXT: v_mov_b32_e32 v4, 0.5			; GFX9-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-NEXT: v_mov_b32_e32 v4, 0.5
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5f32_imm:			; GFX11-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_dual_mov_b32 v2, 4.0 :: v_dual_mov_b32 v3, -1.0			; GFX11-NEXT: v_dual_mov_b32 v2, 4.0 :: v_dual_mov_b32 v3, -1.0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, 0.5			; GFX11-NEXT: v_mov_b32_e32 v4, 0.5
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm:			; GFX9-LABEL: test_call_external_void_func_f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm:			; GFX10-LABEL: test_call_external_void_func_f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f64_imm:			; GFX11-LABEL: test_call_external_void_func_f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x40100000			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x40100000
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64(double 4.0)			call amdgpu_gfx void @external_void_func_f64(double 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f64_imm:			; GFX11-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: v_mov_b32_e32 v4, 0			; GFX9-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0			; GFX10-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f64_imm:			; GFX11-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 0x40200000			; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 0x40200000
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16:			; GFX9-LABEL: test_call_external_void_func_v2i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16:			; GFX10-LABEL: test_call_external_void_func_v2i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i16:			; GFX11-LABEL: test_call_external_void_func_v2i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_b32 v0, v[0:1], off			; GFX11-NEXT: global_load_b32 v0, v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, ptr addrspace(1) undef			%val = load <2 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)			call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16:			; GFX9-LABEL: test_call_external_void_func_v3i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16:			; GFX10-LABEL: test_call_external_void_func_v3i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16:			; GFX11-LABEL: test_call_external_void_func_v3i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, ptr addrspace(1) undef			%val = load <3 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16:			; GFX9-LABEL: test_call_external_void_func_v3f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16:			; GFX10-LABEL: test_call_external_void_func_v3f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16:			; GFX11-LABEL: test_call_external_void_func_v3f16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, ptr addrspace(1) undef			%val = load <3 x half>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 3			; GFX9-NEXT: v_mov_b32_e32 v1, 3
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 3			; GFX10-NEXT: v_mov_b32_e32 v1, 3
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_imm:			; GFX11-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0x20001 :: v_dual_mov_b32 v1, 3			; GFX11-NEXT: v_dual_mov_b32 v0, 0x20001 :: v_dual_mov_b32 v1, 3
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_imm:			; GFX11-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX11-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX11-NEXT: v_mov_b32_e32 v1, 0x4400
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16:			; GFX9-LABEL: test_call_external_void_func_v4i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16:			; GFX10-LABEL: test_call_external_void_func_v4i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16:			; GFX11-LABEL: test_call_external_void_func_v4i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, ptr addrspace(1) undef			%val = load <4 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_imm:			; GFX11-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX11-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX11-NEXT: v_mov_b32_e32 v1, 0x40003
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16:			; GFX9-LABEL: test_call_external_void_func_v2f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16:			; GFX10-LABEL: test_call_external_void_func_v2f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f16:			; GFX11-LABEL: test_call_external_void_func_v2f16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_b32 v0, v[0:1], off			; GFX11-NEXT: global_load_b32 v0, v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, ptr addrspace(1) undef			%val = load <2 x half>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)			call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32:			; GFX9-LABEL: test_call_external_void_func_v2i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32:			; GFX10-LABEL: test_call_external_void_func_v2i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32:			; GFX11-LABEL: test_call_external_void_func_v2i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, ptr addrspace(1) undef			%val = load <2 x i32>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_imm:			; GFX11-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_imm:			; GFX11-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4
	; GFX11-NEXT: v_mov_b32_e32 v2, 5			; GFX11-NEXT: v_mov_b32_e32 v2, 5
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: v_mov_b32_e32 v3, 6			; GFX9-NEXT: v_mov_b32_e32 v3, 6
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-NEXT: v_mov_b32_e32 v3, 6			; GFX10-NEXT: v_mov_b32_e32 v3, 6
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_i32:			; GFX11-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4
	; GFX11-NEXT: v_dual_mov_b32 v2, 5 :: v_dual_mov_b32 v3, 6			; GFX11-NEXT: v_dual_mov_b32 v2, 5 :: v_dual_mov_b32 v3, 6
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)			call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32:			; GFX9-LABEL: test_call_external_void_func_v4i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32:			; GFX10-LABEL: test_call_external_void_func_v4i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32:			; GFX11-LABEL: test_call_external_void_func_v4i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off			; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, ptr addrspace(1) undef			%val = load <4 x i32>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_imm:			; GFX11-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5i32_imm:			; GFX11-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, 5			; GFX11-NEXT: v_mov_b32_e32 v4, 5
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32:			; GFX9-LABEL: test_call_external_void_func_v8i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v8, 0			; GFX9-NEXT: v_mov_b32_e32 v8, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32:			; GFX10-LABEL: test_call_external_void_func_v8i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v8, 0			; GFX10-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32:			; GFX11-LABEL: test_call_external_void_func_v8i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, 0			; GFX11-NEXT: v_mov_b32_e32 v4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_b128 v[0:3], v4, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v4, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v4, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v4, s[0:1] offset:16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <8 x i32>, ptr addrspace(1) %ptr			%val = load <8 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: v_mov_b32_e32 v5, 6			; GFX9-NEXT: v_mov_b32_e32 v5, 6
	; GFX9-NEXT: v_mov_b32_e32 v6, 7			; GFX9-NEXT: v_mov_b32_e32 v6, 7
	; GFX9-NEXT: v_mov_b32_e32 v7, 8			; GFX9-NEXT: v_mov_b32_e32 v7, 8
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-NEXT: v_mov_b32_e32 v5, 6			; GFX10-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-NEXT: v_mov_b32_e32 v6, 7			; GFX10-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-NEXT: v_mov_b32_e32 v7, 8			; GFX10-NEXT: v_mov_b32_e32 v7, 8
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_imm:			; GFX11-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4			; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 5 :: v_dual_mov_b32 v5, 6			; GFX11-NEXT: v_dual_mov_b32 v4, 5 :: v_dual_mov_b32 v5, 6
	; GFX11-NEXT: v_dual_mov_b32 v6, 7 :: v_dual_mov_b32 v7, 8			; GFX11-NEXT: v_dual_mov_b32 v6, 7 :: v_dual_mov_b32 v7, 8
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i32:			; GFX9-LABEL: test_call_external_void_func_v16i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v16, 0			; GFX9-NEXT: v_mov_b32_e32 v16, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32:			; GFX10-LABEL: test_call_external_void_func_v16i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v16, 0			; GFX10-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i32:			; GFX11-LABEL: test_call_external_void_func_v16i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v12, 0			; GFX11-NEXT: v_mov_b32_e32 v12, 0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: global_load_b128 v[0:3], v12, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v12, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v12, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v12, s[0:1] offset:16
	; GFX11-NEXT: global_load_b128 v[8:11], v12, s[0:1] offset:32			; GFX11-NEXT: global_load_b128 v[8:11], v12, s[0:1] offset:32
	; GFX11-NEXT: global_load_b128 v[12:15], v12, s[0:1] offset:48			; GFX11-NEXT: global_load_b128 v[12:15], v12, s[0:1] offset:48
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x3			; GFX10-SCRATCH-NEXT: s_clause 0x3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <16 x i32>, ptr addrspace(1) %ptr			%val = load <16 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)			call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32:			; GFX9-LABEL: test_call_external_void_func_v32i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v28, 0			; GFX9-NEXT: v_mov_b32_e32 v28, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32:			; GFX10-LABEL: test_call_external_void_func_v32i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32:			; GFX11-LABEL: test_call_external_void_func_v32i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v28, 0			; GFX11-NEXT: v_mov_b32_e32 v28, 0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x7			; GFX11-NEXT: s_clause 0x7
	; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16
	; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32			; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32
	; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48			; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48
	; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64			; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64
	; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80			; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80
	; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96			; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96
	; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112			; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <32 x i32>, ptr addrspace(1) %ptr			%val = load <32 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)			call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_i32:			; GFX9-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v28, 0			; GFX9-NEXT: v_mov_b32_e32 v28, 0
	; GFX9-NEXT: global_load_dword v32, v[0:1], off			; GFX9-NEXT: global_load_dword v32, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12
	; GFX9-NEXT: s_waitcnt vmcnt(8)			; GFX9-NEXT: s_waitcnt vmcnt(8)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: global_load_dword v33, v[0:1], off			; GFX10-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt vmcnt(8)			; GFX10-NEXT: s_waitcnt vmcnt(8)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_i32:			; GFX11-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v28, 0			; GFX11-NEXT: v_mov_b32_e32 v28, 0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: global_load_b32 v32, v[0:1], off			; GFX11-NEXT: global_load_b32 v32, v[0:1], off
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x7			; GFX11-NEXT: s_clause 0x7
	; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]
	; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16			; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16
	; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32			; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32
	; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48			; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48
	; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64			; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64
	; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80			; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80
	; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96			; GFX11-NEXT: global_load_b128 v[24:27], v28, s[0:1] offset:96
	; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112			; GFX11-NEXT: global_load_b128 v[28:31], v28, s[0:1] offset:112
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt vmcnt(8)			; GFX11-NEXT: s_waitcnt vmcnt(8)
	; GFX11-NEXT: scratch_store_b32 off, v32, s32			; GFX11-NEXT: scratch_store_b32 off, v32, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef
	%val0 = load <32 x i32>, ptr addrspace(1) %ptr0			%val0 = load <32 x i32>, ptr addrspace(1) %ptr0
	%val1 = load i32, ptr addrspace(1) undef			%val1 = load i32, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_i32_func_i32_imm(ptr addrspace(1) %out) #0 {			define amdgpu_gfx void @test_call_external_i32_func_i32_imm(ptr addrspace(1) %out) #0 {
	; GFX9-LABEL: test_call_external_i32_func_i32_imm:			; GFX9-LABEL: test_call_external_i32_func_i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v43, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v41, v0			; GFX9-NEXT: v_mov_b32_e32 v41, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_mov_b32_e32 v42, v1			; GFX9-NEXT: v_mov_b32_e32 v42, v1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: global_store_dword v[41:42], v0, off			; GFX9-NEXT: global_store_dword v[41:42], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v43, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v43, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_mov_b32_e32 v42, v1			; GFX10-NEXT: v_mov_b32_e32 v42, v1
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: global_store_dword v[41:42], v0, off			; GFX10-NEXT: global_store_dword v[41:42], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v43, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8
				; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_i32_func_i32_imm:			; GFX11-LABEL: test_call_external_i32_func_i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:8
				; GFX11-NEXT: scratch_store_b32 off, v43, s32 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v43, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v42, s33			; GFX11-NEXT: scratch_store_b32 off, v42, s33
	; GFX11-NEXT: v_dual_mov_b32 v42, v1 :: v_dual_mov_b32 v41, v0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_dual_mov_b32 v42, v1 :: v_dual_mov_b32 v41, v0
	; GFX11-NEXT: v_mov_b32_e32 v0, 42			; GFX11-NEXT: v_mov_b32_e32 v0, 42
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 32
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: global_store_b32 v[41:42], v0, off dlc			; GFX11-NEXT: global_store_b32 v[41:42], v0, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_b32 v42, off, s33			; GFX11-NEXT: scratch_load_b32 v42, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v43, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:8
				; GFX11-NEXT: scratch_load_b32 v43, off, s32 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v43, s32 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v43, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off			; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v43, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8
				; GFX10-SCRATCH-NEXT: scratch_load_dword v43, off, s32 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)			%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
	store volatile i32 %val, ptr addrspace(1) %out			store volatile i32 %val, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]			; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]
	; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4			; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]			; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]
	; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4			; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: global_load_u8 v0, v1, s[0:1]			; GFX11-NEXT: global_load_u8 v0, v1, s[0:1]
	; GFX11-NEXT: global_load_b32 v1, v1, s[0:1] offset:4			; GFX11-NEXT: global_load_b32 v1, v1, s[0:1] offset:4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4			; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load { i8, i32 }, ptr addrspace(1) %ptr0			%val = load { i8, i32 }, ptr addrspace(1) %ptr0
	call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)			call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_byval_struct_i8_i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_byval_struct_i8_i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
				; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:8
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b8 off, v0, s33			; GFX11-NEXT: scratch_store_b8 off, v0, s33
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4
	; GFX11-NEXT: v_mov_b32_e32 v0, s33			; GFX11-NEXT: v_mov_b32_e32 v0, s33
				; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:8
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = alloca { i8, i32 }, align 4, addrspace(5)			%val = alloca { i8, i32 }, align 4, addrspace(5)
	%gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 1
	store i8 3, ptr addrspace(5) %gep0			store i8 3, ptr addrspace(5) %gep0
	store i32 8, ptr addrspace(5) %gep1			store i32 8, ptr addrspace(5) %gep1
	call amdgpu_gfx void @external_void_func_byval_struct_i8_i32(ptr addrspace(5) byval({ i8, i32 }) %val)			call amdgpu_gfx void @external_void_func_byval_struct_i8_i32(ptr addrspace(5) byval({ i8, i32 }) %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_add_u32_e32 v0, 8, v0			; GFX9-NEXT: v_add_u32_e32 v0, 8, v0
	; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_byte v[0:1], v0, off			; GFX9-NEXT: global_store_byte v[0:1], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dword v[0:1], v1, off			; GFX9-NEXT: global_store_dword v[0:1], v1, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33
				; GFX10-NEXT: s_addk_i32 s32, 0x400
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_byte v[0:1], v0, off			; GFX10-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: global_store_dword v[0:1], v1, off			; GFX10-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:16 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:16
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8			; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX11-NEXT: s_add_i32 vcc_lo, s33, 8			; GFX11-NEXT: s_add_i32 vcc_lo, s33, 8
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b8 off, v0, s33			; GFX11-NEXT: scratch_store_b8 off, v0, s33
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4
	; GFX11-NEXT: v_dual_mov_b32 v0, vcc_lo :: v_dual_mov_b32 v1, s33			; GFX11-NEXT: v_dual_mov_b32 v0, vcc_lo :: v_dual_mov_b32 v1, s33
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_u8 v0, off, s33 offset:8			; GFX11-NEXT: scratch_load_u8 v0, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v1, off, s33 offset:12
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_store_b8 v[0:1], v0, off dlc			; GFX11-NEXT: global_store_b8 v[0:1], v0, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: global_store_b32 v[0:1], v1, off dlc			; GFX11-NEXT: global_store_b32 v[0:1], v1, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:16 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:16
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:16 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:16 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:20 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8			; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8			; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8
	; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12			; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off			; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off			; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:16 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:16
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:20
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%in.val = alloca { i8, i32 }, align 4, addrspace(5)			%in.val = alloca { i8, i32 }, align 4, addrspace(5)
	%out.val = alloca { i8, i32 }, align 4, addrspace(5)			%out.val = alloca { i8, i32 }, align 4, addrspace(5)
	%in.gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 0			%in.gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 0
	%in.gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 1			%in.gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 1
	Show All 11 Lines
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i8() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i8() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i8:			; GFX9-LABEL: test_call_external_void_func_v16i8:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
	Show All 16 Lines
	; GFX9-NEXT: v_mov_b32_e32 v12, v3			; GFX9-NEXT: v_mov_b32_e32 v12, v3
	; GFX9-NEXT: v_mov_b32_e32 v1, v16			; GFX9-NEXT: v_mov_b32_e32 v1, v16
	; GFX9-NEXT: v_mov_b32_e32 v2, v17			; GFX9-NEXT: v_mov_b32_e32 v2, v17
	; GFX9-NEXT: v_mov_b32_e32 v3, v18			; GFX9-NEXT: v_mov_b32_e32 v3, v18
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i8:			; GFX10-LABEL: test_call_external_void_func_v16i8:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	Show All 14 Lines
	; GFX10-NEXT: v_mov_b32_e32 v12, v3			; GFX10-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-NEXT: v_mov_b32_e32 v1, v16			; GFX10-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-NEXT: v_mov_b32_e32 v2, v17			; GFX10-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-NEXT: v_mov_b32_e32 v3, v18			; GFX10-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i8:			; GFX11-LABEL: test_call_external_void_func_v16i8:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_b128 v[0:3], v0, s[0:1]			; GFX11-NEXT: global_load_b128 v[0:3], v0, s[0:1]
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_lshrrev_b32_e32 v16, 8, v0			; GFX11-NEXT: v_lshrrev_b32_e32 v16, 8, v0
	Show All 11 Lines
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v8, v2			; GFX11-NEXT: v_mov_b32_e32 v8, v2
	; GFX11-NEXT: v_dual_mov_b32 v12, v3 :: v_dual_mov_b32 v3, v18			; GFX11-NEXT: v_dual_mov_b32 v12, v3 :: v_dual_mov_b32 v3, v18
	; GFX11-NEXT: v_dual_mov_b32 v1, v16 :: v_dual_mov_b32 v2, v17			; GFX11-NEXT: v_dual_mov_b32 v1, v16 :: v_dual_mov_b32 v2, v17
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <16 x i8>, ptr addrspace(1) %ptr			%val = load <16 x i8>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)			call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)
	ret void			ret void
	}			}

	define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {			define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {
	; GFX9-LABEL: tail_call_byval_align16:			; GFX9-LABEL: tail_call_byval_align16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 32			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_writelane_b32 v40, s34, 2			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: v_writelane_b32 v40, s35, 3			; GFX9-NEXT: v_writelane_b32 v40, s35, 3
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s38, v40, 6			; GFX9-NEXT: v_readlane_b32 s38, v40, 6
	; GFX9-NEXT: v_readlane_b32 s37, v40, 5			; GFX9-NEXT: v_readlane_b32 s37, v40, 5
	; GFX9-NEXT: v_readlane_b32 s36, v40, 4			; GFX9-NEXT: v_readlane_b32 s36, v40, 4
	; GFX9-NEXT: v_readlane_b32 s35, v40, 3			; GFX9-NEXT: v_readlane_b32 s35, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 32			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: tail_call_byval_align16:			; GFX10-LABEL: tail_call_byval_align16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines

	; inreg arguments are put in sgprs			; inreg arguments are put in sgprs
	define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, 1			; GFX11-NEXT: v_mov_b32_e32 v0, 1
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)			call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x7b
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)			call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x7b
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)			call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s4, 42			; GFX9-NEXT: s_mov_b32 s4, 42
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 42
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 42
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 42
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 42
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)			call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: s_mov_b32 s5, 0			; GFX9-NEXT: s_mov_b32 s5, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x7b
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_movk_i32 s4, 0x7b
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)			call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 6			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b64 s[0:1], 0			; GFX11-NEXT: s_mov_b64 s[0:1], 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 6			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, ptr addrspace(4) null			%val = load <2 x i64>, ptr addrspace(4) null
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 6			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 6			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 8
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 6			; GFX9-NEXT: v_writelane_b32 v40, s30, 6
	; GFX9-NEXT: s_mov_b32 s8, 1			; GFX9-NEXT: s_mov_b32 s8, 1
	; GFX9-NEXT: s_mov_b32 s9, 2			; GFX9-NEXT: s_mov_b32 s9, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 7			; GFX9-NEXT: v_writelane_b32 v40, s31, 7
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 7			; GFX9-NEXT: v_readlane_b32 s31, v40, 7
	; GFX9-NEXT: v_readlane_b32 s30, v40, 6			; GFX9-NEXT: v_readlane_b32 s30, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 8			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 1			; GFX10-NEXT: s_mov_b32 s8, 1
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 2			; GFX10-NEXT: s_mov_b32 s9, 2
	; GFX10-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 8			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b64 s[0:1], 0			; GFX11-NEXT: s_mov_b64 s[0:1], 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 1			; GFX11-NEXT: s_mov_b32 s8, 1
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: s_mov_b32 s9, 2			; GFX11-NEXT: s_mov_b32 s9, 2
	; GFX11-NEXT: v_writelane_b32 v40, s30, 6			; GFX11-NEXT: v_writelane_b32 v40, s30, 6
	; GFX11-NEXT: v_writelane_b32 v40, s31, 7			; GFX11-NEXT: v_writelane_b32 v40, s31, 7
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 7			; GFX11-NEXT: v_readlane_b32 s31, v40, 7
	; GFX11-NEXT: v_readlane_b32 s30, v40, 6			; GFX11-NEXT: v_readlane_b32 s30, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 8			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(4) null			%load = load <2 x i64>, ptr addrspace(4) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 8			; GFX9-NEXT: v_writelane_b32 v40, s30, 8
	; GFX9-NEXT: s_mov_b32 s8, 1			; GFX9-NEXT: s_mov_b32 s8, 1
	; GFX9-NEXT: s_mov_b32 s9, 2			; GFX9-NEXT: s_mov_b32 s9, 2
	; GFX9-NEXT: s_mov_b32 s10, 3			; GFX9-NEXT: s_mov_b32 s10, 3
	; GFX9-NEXT: s_mov_b32 s11, 4			; GFX9-NEXT: s_mov_b32 s11, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 9			; GFX9-NEXT: v_writelane_b32 v40, s31, 9
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 9			; GFX9-NEXT: v_readlane_b32 s31, v40, 9
	; GFX9-NEXT: v_readlane_b32 s30, v40, 8			; GFX9-NEXT: v_readlane_b32 s30, v40, 8
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 10			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	Show All 13 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 10			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b64 s[0:1], 0			; GFX11-NEXT: s_mov_b64 s[0:1], 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	Show All 14 Lines
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 10			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(4) null			%load = load <2 x i64>, ptr addrspace(4) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x4400			; GFX9-NEXT: s_movk_i32 s4, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x4400
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_movk_i32 s4, 0x4400
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_movk_i32 s4, 0x4400
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)			call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s4, 4.0			; GFX9-NEXT: s_mov_b32 s4, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 4.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 4.0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 4.0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)			call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1.0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 5
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 3			; GFX9-NEXT: v_writelane_b32 v40, s30, 3
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 4			; GFX9-NEXT: v_writelane_b32 v40, s31, 4
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 4			; GFX9-NEXT: v_readlane_b32 s31, v40, 4
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 5			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 5			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1.0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 4.0			; GFX11-NEXT: s_mov_b32 s6, 4.0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 3			; GFX11-NEXT: v_writelane_b32 v40, s30, 3
	; GFX11-NEXT: v_writelane_b32 v40, s31, 4			; GFX11-NEXT: v_writelane_b32 v40, s31, 4
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 4			; GFX11-NEXT: v_readlane_b32 s31, v40, 4
	; GFX11-NEXT: v_readlane_b32 s30, v40, 3			; GFX11-NEXT: v_readlane_b32 s30, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 5			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 7
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 5			; GFX9-NEXT: v_writelane_b32 v40, s30, 5
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: s_mov_b32 s7, -1.0			; GFX9-NEXT: s_mov_b32 s7, -1.0
	; GFX9-NEXT: s_mov_b32 s8, 0.5			; GFX9-NEXT: s_mov_b32 s8, 0.5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 6			; GFX9-NEXT: v_writelane_b32 v40, s31, 6
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 6			; GFX9-NEXT: v_readlane_b32 s31, v40, 6
	; GFX9-NEXT: v_readlane_b32 s30, v40, 5			; GFX9-NEXT: v_readlane_b32 s30, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 7			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, -1.0			; GFX10-NEXT: s_mov_b32 s7, -1.0
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 0.5			; GFX10-NEXT: s_mov_b32 s8, 0.5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 7			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1.0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1.0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 4.0			; GFX11-NEXT: s_mov_b32 s6, 4.0
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, -1.0			; GFX11-NEXT: s_mov_b32 s7, -1.0
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 0.5			; GFX11-NEXT: s_mov_b32 s8, 0.5
	; GFX11-NEXT: v_writelane_b32 v40, s30, 5			; GFX11-NEXT: v_writelane_b32 v40, s30, 5
	; GFX11-NEXT: v_writelane_b32 v40, s31, 6			; GFX11-NEXT: v_writelane_b32 v40, s31, 6
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 6			; GFX11-NEXT: v_readlane_b32 s31, v40, 6
	; GFX11-NEXT: v_readlane_b32 s30, v40, 5			; GFX11-NEXT: v_readlane_b32 s30, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 7			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 0x40100000			; GFX9-NEXT: s_mov_b32 s5, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 0x40100000
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)			call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 6			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 0			; GFX11-NEXT: s_mov_b32 s6, 0
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 0x40100000			; GFX11-NEXT: s_mov_b32 s7, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 6			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 8
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 6			; GFX9-NEXT: v_writelane_b32 v40, s30, 6
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: s_mov_b32 s8, 0			; GFX9-NEXT: s_mov_b32 s8, 0
	; GFX9-NEXT: s_mov_b32 s9, 0x40200000			; GFX9-NEXT: s_mov_b32 s9, 0x40200000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 7			; GFX9-NEXT: v_writelane_b32 v40, s31, 7
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 7			; GFX9-NEXT: v_readlane_b32 s31, v40, 7
	; GFX9-NEXT: v_readlane_b32 s30, v40, 6			; GFX9-NEXT: v_readlane_b32 s30, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 8			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 0			; GFX10-NEXT: s_mov_b32 s8, 0
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 8			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2.0
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 0			; GFX11-NEXT: s_mov_b32 s6, 0
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 0x40100000			; GFX11-NEXT: s_mov_b32 s7, 0x40100000
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 0			; GFX11-NEXT: s_mov_b32 s8, 0
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: s_mov_b32 s9, 0x40200000			; GFX11-NEXT: s_mov_b32 s9, 0x40200000
	; GFX11-NEXT: v_writelane_b32 v40, s30, 6			; GFX11-NEXT: v_writelane_b32 v40, s30, 6
	; GFX11-NEXT: v_writelane_b32 v40, s31, 7			; GFX11-NEXT: v_writelane_b32 v40, s31, 7
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 7			; GFX11-NEXT: v_readlane_b32 s31, v40, 7
	; GFX11-NEXT: v_readlane_b32 s30, v40, 6			; GFX11-NEXT: v_readlane_b32 s30, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 8			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-NEXT: s_mov_b32 s33, s32
				; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3
	; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0			; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
				; GFX11-NEXT: s_mov_b32 s33, s32
				; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, ptr addrspace(4) undef			%val = load <2 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, ptr addrspace(4) undef			%val = load <3 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, ptr addrspace(4) undef			%val = load <3 x half>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 3			; GFX9-NEXT: s_mov_b32 s5, 3
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 3
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 3
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0x20001
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 3
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 0x20001
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 3
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x40003c00			; GFX9-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX9-NEXT: s_movk_i32 s5, 0x4400			; GFX9-NEXT: s_movk_i32 s5, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0x40003c00
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_movk_i32 s5, 0x4400
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_movk_i32 s5, 0x4400
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, ptr addrspace(4) undef			%val = load <4 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 0x40003			; GFX9-NEXT: s_mov_b32 s5, 0x40003
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 0x20001
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 0x40003
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 0x20001
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 0x40003
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-NEXT: s_mov_b32 s33, s32
				; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3
	; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0			; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
				; GFX11-NEXT: s_mov_b32 s33, s32
				; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, ptr addrspace(4) undef			%val = load <2 x half>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, ptr addrspace(4) undef			%val = load <2 x i32>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 5
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 3			; GFX9-NEXT: v_writelane_b32 v40, s30, 3
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 4			; GFX9-NEXT: v_writelane_b32 v40, s31, 4
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 4			; GFX9-NEXT: v_readlane_b32 s31, v40, 4
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 5			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 3
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 5			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 3
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 4
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 3
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 4
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 5			; GFX11-NEXT: s_mov_b32 s6, 5
	; GFX11-NEXT: v_writelane_b32 v40, s30, 3			; GFX11-NEXT: v_writelane_b32 v40, s30, 3
	; GFX11-NEXT: v_writelane_b32 v40, s31, 4			; GFX11-NEXT: v_writelane_b32 v40, s31, 4
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 4			; GFX11-NEXT: v_readlane_b32 s31, v40, 4
	; GFX11-NEXT: v_readlane_b32 s30, v40, 3			; GFX11-NEXT: v_readlane_b32 s30, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 5			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: s_mov_b32 s7, 6			; GFX9-NEXT: s_mov_b32 s7, 6
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 3
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 6			; GFX10-NEXT: s_mov_b32 s7, 6
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 6			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 3
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 4
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 3
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 4
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 5			; GFX11-NEXT: s_mov_b32 s6, 5
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 6			; GFX11-NEXT: s_mov_b32 s7, 6
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 6			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)			call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 6			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 6			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, ptr addrspace(4) undef			%val = load <4 x i32>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 6			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s30, 4			; GFX11-NEXT: v_writelane_b32 v40, s30, 4
	; GFX11-NEXT: v_writelane_b32 v40, s31, 5			; GFX11-NEXT: v_writelane_b32 v40, s31, 5
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 5			; GFX11-NEXT: v_readlane_b32 s31, v40, 5
	; GFX11-NEXT: v_readlane_b32 s30, v40, 4			; GFX11-NEXT: v_readlane_b32 s30, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 6			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 7
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 5			; GFX9-NEXT: v_writelane_b32 v40, s30, 5
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 6			; GFX9-NEXT: v_writelane_b32 v40, s31, 6
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 6			; GFX9-NEXT: v_readlane_b32 s31, v40, 6
	; GFX9-NEXT: v_readlane_b32 s30, v40, 5			; GFX9-NEXT: v_readlane_b32 s30, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 7			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 7			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 5			; GFX11-NEXT: s_mov_b32 s8, 5
	; GFX11-NEXT: v_writelane_b32 v40, s30, 5			; GFX11-NEXT: v_writelane_b32 v40, s30, 5
	; GFX11-NEXT: v_writelane_b32 v40, s31, 6			; GFX11-NEXT: v_writelane_b32 v40, s31, 6
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 6			; GFX11-NEXT: v_readlane_b32 s31, v40, 6
	; GFX11-NEXT: v_readlane_b32 s30, v40, 5			; GFX11-NEXT: v_readlane_b32 s30, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 7			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx8 s[4:11], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx8 s[4:11], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 8			; GFX9-NEXT: v_writelane_b32 v40, s30, 8
	; GFX9-NEXT: v_writelane_b32 v40, s31, 9			; GFX9-NEXT: v_writelane_b32 v40, s31, 9
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 9			; GFX9-NEXT: v_readlane_b32 s31, v40, 9
	; GFX9-NEXT: v_readlane_b32 s30, v40, 8			; GFX9-NEXT: v_readlane_b32 s30, v40, 8
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 10			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 10			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	Show All 11 Lines
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 10			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <8 x i32>, ptr addrspace(4) %ptr			%val = load <8 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 8			; GFX9-NEXT: v_writelane_b32 v40, s30, 8
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	Show All 11 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 10			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 6			; GFX10-NEXT: s_mov_b32 s9, 6
	Show All 10 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 10			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: s_mov_b32 s4, 1
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
				; GFX11-NEXT: v_writelane_b32 v40, s5, 1
				; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, 1
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_mov_b32 s5, 2
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: s_mov_b32 s6, 3			; GFX11-NEXT: s_mov_b32 s6, 3
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: s_mov_b32 s7, 4			; GFX11-NEXT: s_mov_b32 s7, 4
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: s_mov_b32 s8, 5			; GFX11-NEXT: s_mov_b32 s8, 5
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: s_mov_b32 s9, 6			; GFX11-NEXT: s_mov_b32 s9, 6
	Show All 11 Lines
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 10			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6
	Show All 10 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 18
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: v_writelane_b32 v40, s12, 8			; GFX9-NEXT: v_writelane_b32 v40, s12, 8
	; GFX9-NEXT: v_writelane_b32 v40, s13, 9			; GFX9-NEXT: v_writelane_b32 v40, s13, 9
	; GFX9-NEXT: v_writelane_b32 v40, s14, 10			; GFX9-NEXT: v_writelane_b32 v40, s14, 10
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s15, 11			; GFX9-NEXT: v_writelane_b32 v40, s15, 11
	; GFX9-NEXT: v_writelane_b32 v40, s16, 12			; GFX9-NEXT: v_writelane_b32 v40, s16, 12
	; GFX9-NEXT: v_writelane_b32 v40, s17, 13			; GFX9-NEXT: v_writelane_b32 v40, s17, 13
	; GFX9-NEXT: v_writelane_b32 v40, s18, 14			; GFX9-NEXT: v_writelane_b32 v40, s18, 14
	; GFX9-NEXT: v_writelane_b32 v40, s19, 15			; GFX9-NEXT: v_writelane_b32 v40, s19, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 16			; GFX9-NEXT: v_writelane_b32 v40, s30, 16
	; GFX9-NEXT: v_writelane_b32 v40, s31, 17			; GFX9-NEXT: v_writelane_b32 v40, s31, 17
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	Show All 11 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 18			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	Show All 26 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 18			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: v_writelane_b32 v40, s12, 8			; GFX11-NEXT: v_writelane_b32 v40, s12, 8
	Show All 27 Lines
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 18			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	Show All 26 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <16 x i32>, ptr addrspace(4) %ptr			%val = load <16 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 28
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	Show All 12 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s23, 19			; GFX9-NEXT: v_writelane_b32 v40, s23, 19
	; GFX9-NEXT: v_writelane_b32 v40, s24, 20			; GFX9-NEXT: v_writelane_b32 v40, s24, 20
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40			; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s25, 21			; GFX9-NEXT: v_writelane_b32 v40, s25, 21
	; GFX9-NEXT: v_writelane_b32 v40, s26, 22			; GFX9-NEXT: v_writelane_b32 v40, s26, 22
	; GFX9-NEXT: v_writelane_b32 v40, s27, 23			; GFX9-NEXT: v_writelane_b32 v40, s27, 23
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s28, 24			; GFX9-NEXT: v_writelane_b32 v40, s28, 24
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s46			; GFX9-NEXT: v_mov_b32_e32 v0, s46
	; GFX9-NEXT: v_writelane_b32 v40, s29, 25			; GFX9-NEXT: v_writelane_b32 v40, s29, 25
	; GFX9-NEXT: v_mov_b32_e32 v1, s47			; GFX9-NEXT: v_mov_b32_e32 v1, s47
	; GFX9-NEXT: v_mov_b32_e32 v2, s48			; GFX9-NEXT: v_mov_b32_e32 v2, s48
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 28			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 28			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: v_writelane_b32 v40, s12, 8			; GFX11-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 28			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <32 x i32>, ptr addrspace(4) %ptr			%val = load <32 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 28
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	Show All 13 Lines
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s52, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s52, s[34:35], 0x0
	; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35			; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35
	; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35			; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35
	; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40			; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s24, 20			; GFX9-NEXT: v_writelane_b32 v40, s24, 20
	; GFX9-NEXT: v_writelane_b32 v40, s25, 21			; GFX9-NEXT: v_writelane_b32 v40, s25, 21
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s26, 22			; GFX9-NEXT: v_writelane_b32 v40, s26, 22
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s52			; GFX9-NEXT: v_mov_b32_e32 v0, s52
	; GFX9-NEXT: v_writelane_b32 v40, s27, 23			; GFX9-NEXT: v_writelane_b32 v40, s27, 23
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; GFX9-NEXT: v_mov_b32_e32 v0, s46			; GFX9-NEXT: v_mov_b32_e32 v0, s46
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 28			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 28			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: v_writelane_b32 v40, s6, 2			; GFX11-NEXT: v_writelane_b32 v40, s6, 2
	; GFX11-NEXT: v_writelane_b32 v40, s7, 3			; GFX11-NEXT: v_writelane_b32 v40, s7, 3
	; GFX11-NEXT: v_writelane_b32 v40, s8, 4			; GFX11-NEXT: v_writelane_b32 v40, s8, 4
	; GFX11-NEXT: v_writelane_b32 v40, s9, 5			; GFX11-NEXT: v_writelane_b32 v40, s9, 5
	; GFX11-NEXT: v_writelane_b32 v40, s10, 6			; GFX11-NEXT: v_writelane_b32 v40, s10, 6
	; GFX11-NEXT: v_writelane_b32 v40, s11, 7			; GFX11-NEXT: v_writelane_b32 v40, s11, 7
	; GFX11-NEXT: v_writelane_b32 v40, s12, 8			; GFX11-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s10, v40, 6			; GFX11-NEXT: v_readlane_b32 s10, v40, 6
	; GFX11-NEXT: v_readlane_b32 s9, v40, 5			; GFX11-NEXT: v_readlane_b32 s9, v40, 5
	; GFX11-NEXT: v_readlane_b32 s8, v40, 4			; GFX11-NEXT: v_readlane_b32 s8, v40, 4
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 28			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(4), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(4), ptr addrspace(4) undef
	%val0 = load <32 x i32>, ptr addrspace(4) %ptr0			%val0 = load <32 x i32>, ptr addrspace(4) %ptr0
	%val1 = load i32, ptr addrspace(4) undef			%val1 = load i32, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {			define amdgpu_gfx void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {
	; GFX9-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX9-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
	; GFX10-NEXT: s_waitcnt vmcnt(1)			; GFX10-NEXT: s_waitcnt vmcnt(1)
	; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX11-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:8
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33			; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33
				; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32			; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:8
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:12
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)			call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_12xv3i32() #0 {			define amdgpu_gfx void @stack_12xv3i32() #0 {
	; GFX9-LABEL: stack_12xv3i32:			; GFX9-LABEL: stack_12xv3i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 12			; GFX9-NEXT: v_mov_b32_e32 v0, 12
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 14			; GFX9-NEXT: v_mov_b32_e32 v0, 14
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	Show All 35 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_12xv3i32:			; GFX10-LABEL: stack_12xv3i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 12			; GFX10-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-NEXT: v_mov_b32_e32 v1, 13			; GFX10-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-NEXT: v_mov_b32_e32 v2, 14			; GFX10-NEXT: v_mov_b32_e32 v2, 14
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 15			; GFX10-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12
	Show All 32 Lines
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_12xv3i32:			; GFX11-LABEL: stack_12xv3i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13			; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13
	; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15			; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 1 :: v_dual_mov_b32 v5, 1
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1
				; GFX11-NEXT: v_dual_mov_b32 v4, 1 :: v_dual_mov_b32 v5, 1
	; GFX11-NEXT: v_dual_mov_b32 v6, 2 :: v_dual_mov_b32 v7, 2			; GFX11-NEXT: v_dual_mov_b32 v6, 2 :: v_dual_mov_b32 v7, 2
	; GFX11-NEXT: v_dual_mov_b32 v8, 2 :: v_dual_mov_b32 v9, 3			; GFX11-NEXT: v_dual_mov_b32 v8, 2 :: v_dual_mov_b32 v9, 3
	; GFX11-NEXT: v_dual_mov_b32 v10, 3 :: v_dual_mov_b32 v11, 3			; GFX11-NEXT: v_dual_mov_b32 v10, 3 :: v_dual_mov_b32 v11, 3
	; GFX11-NEXT: v_dual_mov_b32 v12, 4 :: v_dual_mov_b32 v13, 4			; GFX11-NEXT: v_dual_mov_b32 v12, 4 :: v_dual_mov_b32 v13, 4
	; GFX11-NEXT: v_dual_mov_b32 v14, 4 :: v_dual_mov_b32 v15, 5			; GFX11-NEXT: v_dual_mov_b32 v14, 4 :: v_dual_mov_b32 v15, 5
	; GFX11-NEXT: v_dual_mov_b32 v16, 5 :: v_dual_mov_b32 v17, 5			; GFX11-NEXT: v_dual_mov_b32 v16, 5 :: v_dual_mov_b32 v17, 5
	; GFX11-NEXT: v_dual_mov_b32 v18, 6 :: v_dual_mov_b32 v19, 6			; GFX11-NEXT: v_dual_mov_b32 v18, 6 :: v_dual_mov_b32 v19, 6
	; GFX11-NEXT: v_dual_mov_b32 v20, 6 :: v_dual_mov_b32 v21, 7			; GFX11-NEXT: v_dual_mov_b32 v20, 6 :: v_dual_mov_b32 v21, 7
	; GFX11-NEXT: v_dual_mov_b32 v22, 7 :: v_dual_mov_b32 v23, 7			; GFX11-NEXT: v_dual_mov_b32 v22, 7 :: v_dual_mov_b32 v23, 7
	; GFX11-NEXT: v_dual_mov_b32 v24, 8 :: v_dual_mov_b32 v25, 8			; GFX11-NEXT: v_dual_mov_b32 v24, 8 :: v_dual_mov_b32 v25, 8
	; GFX11-NEXT: v_dual_mov_b32 v26, 8 :: v_dual_mov_b32 v27, 9			; GFX11-NEXT: v_dual_mov_b32 v26, 8 :: v_dual_mov_b32 v27, 9
	; GFX11-NEXT: v_dual_mov_b32 v28, 9 :: v_dual_mov_b32 v29, 9			; GFX11-NEXT: v_dual_mov_b32 v28, 9 :: v_dual_mov_b32 v29, 9
	; GFX11-NEXT: v_dual_mov_b32 v30, 10 :: v_dual_mov_b32 v31, 11			; GFX11-NEXT: v_dual_mov_b32 v30, 10 :: v_dual_mov_b32 v31, 11
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_12xv3i32:			; GFX10-SCRATCH-LABEL: stack_12xv3i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1
				; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v9, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v9, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v10, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v10, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v11, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v11, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, 4
	Show All 19 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_12xv3i32(			call amdgpu_gfx void @external_void_func_12xv3i32(
	<3 x i32><i32 0, i32 0, i32 0>,			<3 x i32><i32 0, i32 0, i32 0>,
	<3 x i32><i32 1, i32 1, i32 1>,			<3 x i32><i32 1, i32 1, i32 1>,
	Show All 11 Lines
	}			}

	define amdgpu_gfx void @stack_8xv5i32() #0 {			define amdgpu_gfx void @stack_8xv5i32() #0 {
	; GFX9-LABEL: stack_8xv5i32:			; GFX9-LABEL: stack_8xv5i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 9			; GFX9-NEXT: v_mov_b32_e32 v0, 9
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 10			; GFX9-NEXT: v_mov_b32_e32 v0, 10
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5i32:			; GFX10-LABEL: stack_8xv5i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 8			; GFX10-NEXT: v_mov_b32_e32 v0, 8
	; GFX10-NEXT: v_mov_b32_e32 v1, 9			; GFX10-NEXT: v_mov_b32_e32 v1, 9
	; GFX10-NEXT: v_mov_b32_e32 v2, 10			; GFX10-NEXT: v_mov_b32_e32 v2, 10
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 11			; GFX10-NEXT: v_mov_b32_e32 v0, 11
	; GFX10-NEXT: v_mov_b32_e32 v1, 12			; GFX10-NEXT: v_mov_b32_e32 v1, 12
	; GFX10-NEXT: v_mov_b32_e32 v2, 13			; GFX10-NEXT: v_mov_b32_e32 v2, 13
	Show All 40 Lines
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5i32:			; GFX11-LABEL: stack_8xv5i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13			; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13
	; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15			; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15
	; GFX11-NEXT: v_dual_mov_b32 v4, 8 :: v_dual_mov_b32 v5, 9			; GFX11-NEXT: v_dual_mov_b32 v4, 8 :: v_dual_mov_b32 v5, 9
	; GFX11-NEXT: v_dual_mov_b32 v6, 10 :: v_dual_mov_b32 v7, 11			; GFX11-NEXT: v_dual_mov_b32 v6, 10 :: v_dual_mov_b32 v7, 11
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16
	; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32			; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0			; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0
	Show All 15 Lines
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5i32:			; GFX10-SCRATCH-LABEL: stack_8xv5i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	Show All 29 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5i32(			call amdgpu_gfx void @external_void_func_8xv5i32(
	<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,			<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,
	<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,			<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,
	<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,			<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,
	<5 x i32><i32 3, i32 3, i32 3, i32 3, i32 3>,			<5 x i32><i32 3, i32 3, i32 3, i32 3, i32 3>,
	<5 x i32><i32 4, i32 4, i32 4, i32 4, i32 4>,			<5 x i32><i32 4, i32 4, i32 4, i32 4, i32 4>,
	<5 x i32><i32 5, i32 5, i32 5, i32 5, i32 5>,			<5 x i32><i32 5, i32 5, i32 5, i32 5, i32 5>,
	<5 x i32><i32 6, i32 7, i32 8, i32 9, i32 10>,			<5 x i32><i32 6, i32 7, i32 8, i32 9, i32 10>,
	<5 x i32><i32 11, i32 12, i32 13, i32 14, i32 15>)			<5 x i32><i32 11, i32 12, i32 13, i32 14, i32 15>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_8xv5f32() #0 {			define amdgpu_gfx void @stack_8xv5f32() #0 {
	; GFX9-LABEL: stack_8xv5f32:			; GFX9-LABEL: stack_8xv5f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5f32:			; GFX10-LABEL: stack_8xv5f32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000
	Show All 40 Lines
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5f32:			; GFX11-LABEL: stack_8xv5f32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x41400000			; GFX11-NEXT: v_mov_b32_e32 v0, 0x41400000
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x41500000			; GFX11-NEXT: v_mov_b32_e32 v1, 0x41500000
	; GFX11-NEXT: v_mov_b32_e32 v2, 0x41600000			; GFX11-NEXT: v_mov_b32_e32 v2, 0x41600000
	; GFX11-NEXT: v_mov_b32_e32 v3, 0x41700000			; GFX11-NEXT: v_mov_b32_e32 v3, 0x41700000
	; GFX11-NEXT: v_mov_b32_e32 v4, 0x41000000			; GFX11-NEXT: v_mov_b32_e32 v4, 0x41000000
	; GFX11-NEXT: v_mov_b32_e32 v5, 0x41100000			; GFX11-NEXT: v_mov_b32_e32 v5, 0x41100000
	; GFX11-NEXT: v_mov_b32_e32 v6, 0x41200000			; GFX11-NEXT: v_mov_b32_e32 v6, 0x41200000
	; GFX11-NEXT: v_mov_b32_e32 v7, 0x41300000			; GFX11-NEXT: v_mov_b32_e32 v7, 0x41300000
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16
	; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32			; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32
	; GFX11-NEXT: v_mov_b32_e32 v6, 1.0			; GFX11-NEXT: v_mov_b32_e32 v6, 1.0
	; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0			; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
	Show All 17 Lines
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5f32:			; GFX10-SCRATCH-LABEL: stack_8xv5f32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x41400000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x41400000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	Show All 29 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5f32(			call amdgpu_gfx void @external_void_func_8xv5f32(
	<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,			<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,
	<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,			<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,
	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX11 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX11 %s

	declare hidden amdgpu_gfx void @external_void_func_void() #0			declare hidden amdgpu_gfx void @external_void_func_void() #0

	define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {			define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
	; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX11-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 4			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s5, 1			; GFX11-NEXT: v_writelane_b32 v40, s5, 1
	; GFX11-NEXT: s_getpc_b64 s[4:5]			; GFX11-NEXT: s_getpc_b64 s[4:5]
	; GFX11-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 2
	; GFX11-NEXT: v_writelane_b32 v40, s31, 3			; GFX11-NEXT: v_writelane_b32 v40, s31, 3
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 4			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "", ""() #0			call void asm sideeffect "", ""() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s31			; GFX9-NEXT: ; def s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s31			; GFX9-NEXT: s_mov_b32 s4, s31
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s31, s4			; GFX9-NEXT: s_mov_b32 s31, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s31			; GFX9-NEXT: ; use s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s31			; GFX10-NEXT: ; def s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_mov_b32 s4, s31			; GFX10-NEXT: s_mov_b32 s4, s31
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_mov_b32 s31, s4			; GFX10-NEXT: s_mov_b32 s31, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s31			; GFX10-NEXT: ; use s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX11-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s31			; GFX11-NEXT: ; def s31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_mov_b32 s4, s31			; GFX11-NEXT: s_mov_b32 s4, s31
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_mov_b32 s31, s4			; GFX11-NEXT: s_mov_b32 s31, s4
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s31			; GFX11-NEXT: ; use s31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s31 = call i32 asm sideeffect "; def $0", "={s31}"()			%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s31}"(i32 %s31)			call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v42, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v31			; GFX9-NEXT: ; def v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v41, v31			; GFX9-NEXT: v_mov_b32_e32 v41, v31
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v31, v41			; GFX9-NEXT: v_mov_b32_e32 v31, v41
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v31			; GFX9-NEXT: ; use v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v42, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v42, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v31			; GFX10-NEXT: ; def v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v41, v31			; GFX10-NEXT: v_mov_b32_e32 v41, v31
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_mov_b32_e32 v31, v41			; GFX10-NEXT: v_mov_b32_e32 v31, v41
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v31			; GFX10-NEXT: ; use v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v42, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:4 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:4
				; GFX11-NEXT: scratch_store_b32 off, v42, s32 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v42, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill
				; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def v31			; GFX11-NEXT: ; def v31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_mov_b32_e32 v41, v31			; GFX11-NEXT: v_mov_b32_e32 v41, v31
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v31, v41			; GFX11-NEXT: v_mov_b32_e32 v31, v41
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v31			; GFX11-NEXT: ; use v31
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v42, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:4 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:4
				; GFX11-NEXT: scratch_load_b32 v42, off, s32 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%v31 = call i32 asm sideeffect "; def $0", "={v31}"()			%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v31}"(i32 %v31)			call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
	ret void			ret void
	}			}


	define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_s33:			; GFX9-LABEL: test_call_void_func_void_preserves_s33:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s33			; GFX9-NEXT: ; def s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s33			; GFX9-NEXT: ; use s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s33:			; GFX10-LABEL: test_call_void_func_void_preserves_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s33			; GFX10-NEXT: ; def s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s33			; GFX10-NEXT: ; use s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_s33:			; GFX11-LABEL: test_call_void_func_void_preserves_s33:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s33			; GFX11-NEXT: ; def s33
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, s33			; GFX11-NEXT: s_mov_b32 s4, s33
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_mov_b32 s33, s4			; GFX11-NEXT: s_mov_b32 s33, s4
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s33			; GFX11-NEXT: ; use s33
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s33 = call i32 asm sideeffect "; def $0", "={s33}"()			%s33 = call i32 asm sideeffect "; def $0", "={s33}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s33}"(i32 %s33)			call void asm sideeffect "; use $0", "{s33}"(i32 %s33)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_s34:			; GFX9-LABEL: test_call_void_func_void_preserves_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s34			; GFX9-NEXT: ; def s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_mov_b32 s4, s34			; GFX9-NEXT: s_mov_b32 s4, s34
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s34, s4			; GFX9-NEXT: s_mov_b32 s34, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s34			; GFX9-NEXT: ; use s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s34:			; GFX10-LABEL: test_call_void_func_void_preserves_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[36:37]			; GFX10-NEXT: s_getpc_b64 s[36:37]
	; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s34			; GFX10-NEXT: ; def s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s34			; GFX10-NEXT: s_mov_b32 s4, s34
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX10-NEXT: s_mov_b32 s34, s4			; GFX10-NEXT: s_mov_b32 s34, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s34			; GFX10-NEXT: ; use s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_s34:			; GFX11-LABEL: test_call_void_func_void_preserves_s34:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s34			; GFX11-NEXT: ; def s34
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, s34			; GFX11-NEXT: s_mov_b32 s4, s34
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_mov_b32 s34, s4			; GFX11-NEXT: s_mov_b32 s34, s4
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s34			; GFX11-NEXT: ; use s34
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s34 = call i32 asm sideeffect "; def $0", "={s34}"()			%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s34}"(i32 %s34)			call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_v40:			; GFX9-LABEL: test_call_void_func_void_preserves_v40:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v41, s33, 2			; GFX9-NEXT: v_writelane_b32 v42, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v41, s30, 0			; GFX9-NEXT: v_writelane_b32 v41, s30, 0
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v41, s31, 1			; GFX9-NEXT: v_writelane_b32 v41, s31, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v40			; GFX9-NEXT: ; def v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v40			; GFX9-NEXT: ; use v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v41, 1			; GFX9-NEXT: v_readlane_b32 s31, v41, 1
	; GFX9-NEXT: v_readlane_b32 s30, v41, 0			; GFX9-NEXT: v_readlane_b32 s30, v41, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v41, 2			; GFX9-NEXT: v_readlane_b32 s33, v42, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_v40:			; GFX10-LABEL: test_call_void_func_void_preserves_v40:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v41, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s30, 0
				; GFX10-NEXT: v_writelane_b32 v42, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v40			; GFX10-NEXT: ; def v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v41, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v40			; GFX10-NEXT: ; use v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v41, 1			; GFX10-NEXT: v_readlane_b32 s31, v41, 1
	; GFX10-NEXT: v_readlane_b32 s30, v41, 0			; GFX10-NEXT: v_readlane_b32 s30, v41, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v41, 2			; GFX10-NEXT: v_readlane_b32 s33, v42, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_v40:			; GFX11-LABEL: test_call_void_func_void_preserves_v40:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
				; GFX11-NEXT: scratch_store_b32 off, v42, s32 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v41, s33, 2			; GFX11-NEXT: v_writelane_b32 v41, s30, 0
				; GFX11-NEXT: v_writelane_b32 v42, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
				; GFX11-NEXT: v_writelane_b32 v41, s31, 1
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def v40			; GFX11-NEXT: ; def v40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v41, s30, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v41, s31, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v40			; GFX11-NEXT: ; use v40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v41, 1			; GFX11-NEXT: v_readlane_b32 s31, v41, 1
	; GFX11-NEXT: v_readlane_b32 s30, v41, 0			; GFX11-NEXT: v_readlane_b32 s30, v41, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v41, 2			; GFX11-NEXT: v_readlane_b32 s33, v42, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
				; GFX11-NEXT: scratch_load_b32 v42, off, s32 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%v40 = call i32 asm sideeffect "; def $0", "={v40}"()			%v40 = call i32 asm sideeffect "; def $0", "={v40}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v40}"(i32 %v40)			call void asm sideeffect "; use $0", "{v40}"(i32 %v40)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {
	; GFX9-LABEL: test_call_void_func_void_clobber_s33:			; GFX9-LABEL: test_call_void_func_void_clobber_s33:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s33:			; GFX10-LABEL: test_call_void_func_void_clobber_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_clobber_s33:			; GFX11-LABEL: test_call_void_func_void_clobber_s33:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s33@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s33@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s33@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s33@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s33()			call amdgpu_gfx void @void_func_void_clobber_s33()
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {
	; GFX9-LABEL: test_call_void_func_void_clobber_s34:			; GFX9-LABEL: test_call_void_func_void_clobber_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s34:			; GFX10-LABEL: test_call_void_func_void_clobber_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_clobber_s34:			; GFX11-LABEL: test_call_void_func_void_clobber_s34:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v40, s30, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s34@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s34@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s34@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s34@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v40, s31, 1
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s34()			call amdgpu_gfx void @void_func_void_clobber_s34()
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s40			; GFX9-NEXT: s_mov_b32 s4, s40
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s4			; GFX9-NEXT: ; use s4
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s40			; GFX10-NEXT: s_mov_b32 s4, s40
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s4			; GFX10-NEXT: ; use s4
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: callee_saved_sgpr_kernel:			; GFX11-LABEL: callee_saved_sgpr_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32
				; GFX11-NEXT: scratch_store_b32 off, v41, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v41, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s40			; GFX11-NEXT: ; def s40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, s40			; GFX11-NEXT: s_mov_b32 s4, s40
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2			; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s4			; GFX11-NEXT: ; use s4
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v41, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32
				; GFX11-NEXT: scratch_load_b32 v41, off, s32 offset:4
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v42, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	Show All 12 Lines
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v41			; GFX9-NEXT: ; use v41
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v42, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v42, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s40			; GFX10-NEXT: s_mov_b32 s4, s40
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v32			; GFX10-NEXT: ; def v32
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_mov_b32_e32 v41, v32			; GFX10-NEXT: v_mov_b32_e32 v41, v32
				; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s4			; GFX10-NEXT: ; use s4
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v41			; GFX10-NEXT: ; use v41
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v42, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:4 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:4
				; GFX11-NEXT: scratch_store_b32 off, v42, s32 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 3			; GFX11-NEXT: v_writelane_b32 v40, s4, 0
				; GFX11-NEXT: v_writelane_b32 v42, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill
				; GFX11-NEXT: v_writelane_b32 v40, s30, 1
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def s40			; GFX11-NEXT: ; def s40
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_writelane_b32 v40, s4, 0
	; GFX11-NEXT: s_mov_b32 s4, s40			; GFX11-NEXT: s_mov_b32 s4, s40
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; def v32			; GFX11-NEXT: ; def v32
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_mov_b32_e32 v41, v32			; GFX11-NEXT: v_mov_b32_e32 v41, v32
				; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_writelane_b32 v40, s31, 2
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use s4			; GFX11-NEXT: ; use s4
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v41			; GFX11-NEXT: ; use v41
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v40, 3			; GFX11-NEXT: v_readlane_b32 s33, v42, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:4 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:4
				; GFX11-NEXT: scratch_load_b32 v42, off, s32 offset:8
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0			%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	call void asm sideeffect "; use $0", "v"(i32 %v32) #0			call void asm sideeffect "; use $0", "v"(i32 %v32) #0
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind noinline }			attributes #1 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

	Show All 21 Lines

	define amdgpu_gfx void @call_i1() #0 {			define amdgpu_gfx void @call_i1() #0 {
	; GFX9-LABEL: call_i1:			; GFX9-LABEL: call_i1:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v1, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v1, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_i1:			; GFX10-LABEL: call_i1:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v1, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v1, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i1:			; GFX11-LABEL: call_i1:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s32 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s32 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v1, s33, 2			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_i1@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_i1@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_i1@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_i1@gotpcrel32@hi+12
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
				; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v1, 2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s32 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i1 @return_i1()			call amdgpu_gfx i1 @return_i1()
	ret void			ret void
	Show All 18 Lines

	define amdgpu_gfx void @call_i16() #0 {			define amdgpu_gfx void @call_i16() #0 {
	; GFX9-LABEL: call_i16:			; GFX9-LABEL: call_i16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v1, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v1, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_i16:			; GFX10-LABEL: call_i16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v1, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v1, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i16:			; GFX11-LABEL: call_i16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s32 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s32 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v1, s33, 2			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_i16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_i16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_i16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_i16@gotpcrel32@hi+12
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
				; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v1, 2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s32 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i16 @return_i16()			call amdgpu_gfx i16 @return_i16()
	ret void			ret void
	Show All 18 Lines

	define amdgpu_gfx void @call_2xi16() #0 {			define amdgpu_gfx void @call_2xi16() #0 {
	; GFX9-LABEL: call_2xi16:			; GFX9-LABEL: call_2xi16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v1, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v1, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_2xi16:			; GFX10-LABEL: call_2xi16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v1, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v1, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_2xi16:			; GFX11-LABEL: call_2xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s32 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s32 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v1, s33, 2			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_2xi16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_2xi16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_2xi16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_2xi16@gotpcrel32@hi+12
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
				; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v1, 2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s32 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <2 x i16> @return_2xi16()			call amdgpu_gfx <2 x i16> @return_2xi16()
	ret void			ret void
	Show All 27 Lines

	define amdgpu_gfx void @call_3xi16() #0 {			define amdgpu_gfx void @call_3xi16() #0 {
	; GFX9-LABEL: call_3xi16:			; GFX9-LABEL: call_3xi16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v2, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v2, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_3xi16:			; GFX10-LABEL: call_3xi16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v2, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v2, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_3xi16:			; GFX11-LABEL: call_3xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v2, s32 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v2, s32 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v2, s33, 2			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_3xi16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_3xi16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_3xi16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_3xi16@gotpcrel32@hi+12
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v2, s30, 0			; GFX11-NEXT: v_writelane_b32 v2, s30, 0
				; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v2, s31, 1			; GFX11-NEXT: v_writelane_b32 v2, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v2, 1			; GFX11-NEXT: v_readlane_b32 s31, v2, 1
	; GFX11-NEXT: v_readlane_b32 s30, v2, 0			; GFX11-NEXT: v_readlane_b32 s30, v2, 0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: v_readlane_b32 s33, v2, 2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v2, off, s32 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v2, off, s32 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <3 x i16> @return_3xi16()			call amdgpu_gfx <3 x i16> @return_3xi16()
	ret void			ret void
	▲ Show 20 Lines • Show All 1,191 Lines • ▼ Show 20 Lines

	define amdgpu_gfx void @call_512xi32() #0 {			define amdgpu_gfx void @call_512xi32() #0 {
	; GFX9-LABEL: call_512xi32:			; GFX9-LABEL: call_512xi32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v2, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_add_i32 s33, s32, 0x1ffc0			; GFX9-NEXT: s_add_i32 s33, s32, 0x1ffc0
	; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000			; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000
	; GFX9-NEXT: s_add_i32 s32, s32, 0x60000			; GFX9-NEXT: s_add_i32 s32, s32, 0x60000
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000			; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000
	; GFX9-NEXT: v_readlane_b32 s33, v2, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_512xi32:			; GFX10-LABEL: call_512xi32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v2, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_add_i32 s33, s32, 0xffe0			; GFX10-NEXT: s_add_i32 s33, s32, 0xffe0
	; GFX10-NEXT: s_add_i32 s32, s32, 0x30000			; GFX10-NEXT: s_add_i32 s32, s32, 0x30000
	; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000			; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000			; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000
	; GFX10-NEXT: v_readlane_b32 s33, v2, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_512xi32:			; GFX11-LABEL: call_512xi32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v5, s32 offset:2048 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v5, s32 offset:2048 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v5, s33, 2			; GFX11-NEXT: s_mov_b32 s34, s33
	; GFX11-NEXT: s_add_i32 s33, s32, 0x7ff			; GFX11-NEXT: s_add_i32 s33, s32, 0x7ff
	; GFX11-NEXT: s_addk_i32 s32, 0x1800			; GFX11-NEXT: s_addk_i32 s32, 0x1800
	; GFX11-NEXT: s_and_b32 s33, s33, 0xfffff800			; GFX11-NEXT: s_and_b32 s33, s33, 0xfffff800
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_512xi32@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_512xi32@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_512xi32@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_512xi32@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v5, s30, 0			; GFX11-NEXT: v_writelane_b32 v5, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v0, s33			; GFX11-NEXT: v_mov_b32_e32 v0, s33
	; GFX11-NEXT: v_writelane_b32 v5, s31, 1			; GFX11-NEXT: v_writelane_b32 v5, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v5, 1			; GFX11-NEXT: v_readlane_b32 s31, v5, 1
	; GFX11-NEXT: v_readlane_b32 s30, v5, 0			; GFX11-NEXT: v_readlane_b32 s30, v5, 0
	; GFX11-NEXT: s_addk_i32 s32, 0xe800			; GFX11-NEXT: s_addk_i32 s32, 0xe800
	; GFX11-NEXT: v_readlane_b32 s33, v5, 2			; GFX11-NEXT: s_mov_b32 s33, s34
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v5, off, s32 offset:2048 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v5, off, s32 offset:2048 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <512 x i32> @return_512xi32()			call amdgpu_gfx <512 x i32> @return_512xi32()
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/indirect-call.ll

	Show First 20 Lines • Show All 389 Lines • ▼ Show 20 Lines
	}			}

	define void @test_indirect_call_vgpr_ptr(void()* %fptr) {			define void @test_indirect_call_vgpr_ptr(void()* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr:			; GCN-LABEL: test_indirect_call_vgpr_ptr:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 18			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 18			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr:			; GISEL-LABEL: test_indirect_call_vgpr_ptr:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 18			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 18			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void %fptr()			call void %fptr()
	ret void			ret void
	}			}

	define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {			define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:			; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 18			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 18			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 18			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 18			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void %fptr(i32 123)			call void %fptr(i32 123)
	ret void			ret void
	}			}

	define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {			define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:			; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 18			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 18			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 18			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 18			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	%a = call i32 %fptr()			%a = call i32 %fptr()
	%b = add i32 %a, 1			%b = add i32 %a, 1
	ret i32 %b			ret i32 %b
	}			}

	define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {			define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:			; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 20			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 20			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:
	; GISEL: ; %bb.0: ; %bb0			; GISEL: ; %bb.0: ; %bb0
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 20			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 20			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	br i1 %cond, label %bb1, label %bb2			br i1 %cond, label %bb1, label %bb2

	bb1:			bb1:
	call void %fptr()			call void %fptr()
	br label %bb2			br label %bb2

	bb2:			bb2:
	ret void			ret void
	}			}

	define void @test_indirect_call_vgpr_ptr_inreg_arg(void(i32)* %fptr) {			define void @test_indirect_call_vgpr_ptr_inreg_arg(void(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:			; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s5, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s5
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s5, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s5
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void %fptr(i32 inreg 123)			call amdgpu_gfx void %fptr(i32 inreg 123)
	ret void			ret void
	}			}

	define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {			define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:			; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s10, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s10, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void %fptr(i32 %i)			call amdgpu_gfx void %fptr(i32 %i)
	ret i32 %i			ret i32 %i
	}			}

	; Use a variable inside a waterfall loop and use the return variable after the loop.			; Use a variable inside a waterfall loop and use the return variable after the loop.
	; TODO The argument and return variable could be in the same physical register, but the register			; TODO The argument and return variable could be in the same physical register, but the register
	; allocator is not able to do that because the return value clashes with the liverange of an			; allocator is not able to do that because the return value clashes with the liverange of an
	; IMPLICIT_DEF of the argument.			; IMPLICIT_DEF of the argument.
	define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {			define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:			; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s10, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s10, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	%ret = call amdgpu_gfx i32 %fptr(i32 %i)			%ret = call amdgpu_gfx i32 %fptr(i32 %i)
	ret i32 %ret			ret i32 %ret
	}			}

	; Calling a vgpr can never be a tail call.			; Calling a vgpr can never be a tail call.
	define void @test_indirect_tail_call_vgpr_ptr(void()* %fptr) {			define void @test_indirect_tail_call_vgpr_ptr(void()* %fptr) {
	; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:			; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s10, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:			; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s10, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	tail call amdgpu_gfx void %fptr()			tail call amdgpu_gfx void %fptr()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

	Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	}			}

	define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {			define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {
	; GFX9-LABEL: slsr1_1:			; GFX9-LABEL: slsr1_1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 5
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
				; GFX9-NEXT: v_writelane_b32 v44, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s34, 2			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: v_writelane_b32 v40, s36, 3			; GFX9-NEXT: v_writelane_b32 v40, s36, 3
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s37, 4			; GFX9-NEXT: v_writelane_b32 v40, s37, 4
	Show All 19 Lines
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s37, v40, 4			; GFX9-NEXT: v_readlane_b32 s37, v40, 4
	; GFX9-NEXT: v_readlane_b32 s36, v40, 3			; GFX9-NEXT: v_readlane_b32 s36, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 5			; GFX9-NEXT: v_readlane_b32 s33, v44, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%b = and i32 %b.arg, 16777215			%b = and i32 %b.arg, 16777215
	%s = and i32 %s.arg, 16777215			%s = and i32 %s.arg, 16777215

	; CHECK-LABEL: @slsr1(			; CHECK-LABEL: @slsr1(
	; foo(b * s);			; foo(b * s);
	Show All 26 Lines

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

	Show All 24 Lines
	; redundant spills of s33 or assert.			; redundant spills of s33 or assert.
	define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {			define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {
	; CHECK-LABEL: csr_vgpr_spill_fp_callee:			; CHECK-LABEL: csr_vgpr_spill_fp_callee:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: v_writelane_b32 v1, s33, 2			; CHECK-NEXT: s_mov_b32 s6, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: v_writelane_b32 v1, s30, 0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v1, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; clobber csr v40			; CHECK-NEXT: ; clobber csr v40
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v1, 2			; CHECK-NEXT: s_mov_b32 s33, s6
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	call fastcc void @callee_has_fp()			call fastcc void @callee_has_fp()
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

	define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {			define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:			; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: v_writelane_b32 v1, s33, 2			; CHECK-NEXT: s_mov_b32 s6, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: v_writelane_b32 v1, s30, 0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v1, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v1, 2			; CHECK-NEXT: s_mov_b32 s33, s6
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @tail_call()			%call = call i32 @tail_call()
	ret i32 %call			ret i32 %call
	}			}

	define hidden i32 @caller_save_vgpr_spill_fp() #0 {			define hidden i32 @caller_save_vgpr_spill_fp() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp:			; CHECK-LABEL: caller_save_vgpr_spill_fp:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: v_writelane_b32 v2, s33, 2			; CHECK-NEXT: s_mov_b32 s7, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v2, s30, 0			; CHECK-NEXT: v_writelane_b32 v2, s30, 0
	; CHECK-NEXT: v_writelane_b32 v2, s31, 1			; CHECK-NEXT: v_writelane_b32 v2, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: v_readlane_b32 s31, v2, 1			; CHECK-NEXT: v_readlane_b32 s31, v2, 1
	; CHECK-NEXT: v_readlane_b32 s30, v2, 0			; CHECK-NEXT: v_readlane_b32 s30, v2, 0
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v2, 2			; CHECK-NEXT: s_mov_b32 s33, s7
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @caller_save_vgpr_spill_fp_tail_call()			%call = call i32 @caller_save_vgpr_spill_fp_tail_call()
	ret i32 %call			ret i32 %call
	Show All 26 Lines

llvm/test/CodeGen/AMDGPU/nested-calls.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=hawaii -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=hawaii -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

	; Test calls when called by other callable functions rather than			; Test calls when called by other callable functions rather than
	; kernels.			; kernels.

	declare void @external_void_func_i32(i32) #0			declare void @external_void_func_i32(i32) #0

	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:
	; GCN: s_waitcnt			; GCN: s_waitcnt

	; Spill CSR VGPR used for SGPR spilling			; Spill CSR VGPR used for SGPR spilling
	; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: v_writelane_b32 v40, s33, 2			; GCN-DAG: v_writelane_b32 v41, s33, 0
	; GCN-DAG: s_mov_b32 s33, s32			; GCN-DAG: s_mov_b32 s33, s32
	; GCN-DAG: s_addk_i32 s32, 0x400			; GCN-DAG: s_addk_i32 s32, 0x400
	; GCN-DAG: v_writelane_b32 v40, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v40, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_readlane_b32 s31, v40, 1			; GCN: v_readlane_b32 s31, v40, 1
	; GCN: v_readlane_b32 s30, v40, 0			; GCN: v_readlane_b32 s30, v40, 0

	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @test_func_call_external_void_func_i32_imm() #0 {			define void @test_func_call_external_void_func_i32_imm() #0 {
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

	Show All 9 Lines
	; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382			; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382
	; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0			; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0: ; %entry			; CHECK-NEXT: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v40, s33, 2			; CHECK-NEXT: v_writelane_b32 v41, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3			; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12
	; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: .Ltmp1:			; CHECK-NEXT: .Ltmp1:
	; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: .loc 0 32 1 epilogue_begin is_stmt 0 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 epilogue_begin is_stmt 0 ; lane-info.cpp:32:1
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v40, 2			; CHECK-NEXT: v_readlane_b32 s33, v41, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	entry:			entry:
	call void @_ZL13sleep_foreverv(), !dbg !1646			call void @_ZL13sleep_foreverv(), !dbg !1646
	ret void, !dbg !1647			ret void, !dbg !1647
	}			}
	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

Show All 18 Lines	machineFunctionInfo:
frameOffsetReg: $sgpr33		frameOffsetReg: $sgpr33
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255		liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255

; GFX8-LABEL: name: pei_scavenge_vgpr_spill		; GFX8-LABEL: name: pei_scavenge_vgpr_spill
; GFX8: liveins: $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr2		; GFX8: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX8-NEXT: {{ $}}		; GFX8-NEXT: {{ $}}
; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX8-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)		; GFX8-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX8-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2		; GFX8-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
; GFX8-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc		; GFX8-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
; GFX8-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc		; GFX8-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc
; GFX8-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc		; GFX8-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc
; GFX8-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX8-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX8-NEXT: $vcc_lo = S_MOV_B32 8192		; GFX8-NEXT: $vcc_lo = S_MOV_B32 8192
; GFX8-NEXT: $vgpr0, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr0, 0, implicit $exec		; GFX8-NEXT: $vgpr0, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr0, 0, implicit $exec
; GFX8-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX8-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX8-NEXT: $vcc_lo = S_MOV_B32 16384		; GFX8-NEXT: $vcc_lo = S_MOV_B32 16384
; GFX8-NEXT: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec		; GFX8-NEXT: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec
; GFX8-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec		; GFX8-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
; GFX8-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; GFX8-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc
; GFX8-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0		; GFX8-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0
; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX8-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; GFX8-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX8-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX8-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX9-LABEL: name: pei_scavenge_vgpr_spill		; GFX9-LABEL: name: pei_scavenge_vgpr_spill
; GFX9: liveins: $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr2		; GFX9: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX9-NEXT: {{ $}}		; GFX9-NEXT: {{ $}}
; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)		; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX9-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2		; GFX9-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
; GFX9-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc		; GFX9-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
; GFX9-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc		; GFX9-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc
; GFX9-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc		; GFX9-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc
; GFX9-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX9-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX9-NEXT: $vgpr0 = V_ADD_U32_e32 8192, killed $vgpr0, implicit $exec		; GFX9-NEXT: $vgpr0 = V_ADD_U32_e32 8192, killed $vgpr0, implicit $exec
; GFX9-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX9-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX9-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec		; GFX9-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec
; GFX9-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec		; GFX9-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
; GFX9-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; GFX9-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc
; GFX9-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0		; GFX9-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0
; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX9-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; GFX9-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX9-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX9-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill		; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill
; GFX9-FLATSCR: liveins: $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr2		; GFX9-FLATSCR: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-FLATSCR-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 16388, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 16388, implicit-def dead $scc
; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)		; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)
; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX9-FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2		; GFX9-FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc		; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc
; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc
Show All 16 Lines

llvm/test/CodeGen/AMDGPU/save-fp.ll

	; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX908 %s			; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX908 %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s

	define void @foo() {			define void @foo() {
	bb:			bb:
	ret void			ret void
	}			}

	; FIXME: We spill v40 into AGPR, but still save and restore FP			; FIXME: We spill v40 into AGPR, but still save and restore FP
	; which is not needed in this case.			; which is not needed in this case.

	; GCN-LABEL: {{^}}caller:			; GCN-LABEL: {{^}}caller:

	; GCN: v_writelane_b32 v2, s33, 2			; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GFX900: buffer_store_dword			; GFX900: buffer_store_dword
	; GFX908-DAG: v_accvgpr_write_b32			; GFX908-DAG: v_accvgpr_write_b32
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GFX900: buffer_load_dword			; GFX900: buffer_load_dword
	; GFX908: v_accvgpr_read_b32			; GFX908: v_accvgpr_read_b32
	; GCN: v_readlane_b32 s33, v2, 2			; GCN: s_mov_b32 s33, [[TMP_SGPR]]
	define i64 @caller() {			define i64 @caller() {
	bb:			bb:
	call void asm sideeffect "", "~{v40}" ()			call void asm sideeffect "", "~{v40}" ()
	tail call void @foo()			tail call void @foo()
	ret i64 0			ret i64 0
	}			}

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

	Show All 13 Lines

	define void @spill_sgpr_with_no_lower_vgpr_available() #0 {			define void @spill_sgpr_with_no_lower_vgpr_available() #0 {
	; GCN-LABEL: spill_sgpr_with_no_lower_vgpr_available:			; GCN-LABEL: spill_sgpr_with_no_lower_vgpr_available:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v255, s33, 2			; GCN-NEXT: s_mov_b32 s6, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_i32 s32, s32, 0x7400			; GCN-NEXT: s_add_i32 s32, s32, 0x7400
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00			; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
	; GCN-NEXT: v_readlane_b32 s33, v255, 2			; GCN-NEXT: s_mov_b32 s33, s6
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca

	Show All 30 Lines

	define void @spill_to_lowest_available_vgpr() #0 {			define void @spill_to_lowest_available_vgpr() #0 {
	; GCN-LABEL: spill_to_lowest_available_vgpr:			; GCN-LABEL: spill_to_lowest_available_vgpr:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v254, s33, 2			; GCN-NEXT: s_mov_b32 s6, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_i32 s32, s32, 0x7400			; GCN-NEXT: s_add_i32 s32, s32, 0x7400
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00			; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
	; GCN-NEXT: v_readlane_b32 s33, v254, 2			; GCN-NEXT: s_mov_b32 s33, s6
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca

	▲ Show 20 Lines • Show All 1,498 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	entry:
%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)		%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)
ret i32 %ret		ret i32 %ret
}		}

; Have another non-tail in the function		; Have another non-tail in the function
; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:		; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:
; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1		; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1
; GCN-NEXT: buffer_store_dword [[CSRV:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword [[CSRV:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: buffer_store_dword [[CSRV_1:v[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec		; GCN-NEXT: s_mov_b64 exec
; GCN: v_writelane_b32 [[CSRV]], s33, 2		; GCN: v_writelane_b32 [[CSRV_1]], s33, 0
; GCN-DAG: s_addk_i32 s32, 0x400		; GCN-DAG: s_addk_i32 s32, 0x800

; GCN-DAG: s_getpc_b64 s[4:5]		; GCN-DAG: s_getpc_b64 s[4:5]
; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4		; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12		; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12

; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0
; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1


; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12		; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12

; GCN-DAG: v_readlane_b32 s30, [[CSRV]], 0		; GCN-DAG: v_readlane_b32 s30, [[CSRV]], 0
; GCN-DAG: v_readlane_b32 s31, [[CSRV]], 1		; GCN-DAG: v_readlane_b32 s31, [[CSRV]], 1

; GCN: s_addk_i32 s32, 0xfc00		; GCN: s_addk_i32 s32, 0xf800
; GCN-NEXT: v_readlane_b32 s33,		; GCN-NEXT: v_readlane_b32 s33,
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: buffer_load_dword [[CSRV_1]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {		define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
entry:		entry:
%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)		%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)		%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)
ret i32 %ret		ret i32 %ret
}		}
▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}spill_csr_s5_copy:			; GCN-LABEL: {{^}}spill_csr_s5_copy:
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: v_writelane_b32 v40, s33, 3			; GCN: v_writelane_b32 v41, s33, 0
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9
	; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}			; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}

	; GCN: v_readlane_b32 s33, v40, 3			; GCN: v_readlane_b32 s33, v41, 0
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @spill_csr_s5_copy() #0 {			define void @spill_csr_s5_copy() #0 {
	bb:			bb:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%tmp = tail call i64 @func() #1			%tmp = tail call i64 @func() #1
	%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp			%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp
	%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4			%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/stack-realign.ll

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	define void @func_call_align1024_bp_gets_vgpr_spill(<32 x i32> %a, i32 %b) #0 {			define void @func_call_align1024_bp_gets_vgpr_spill(<32 x i32> %a, i32 %b) #0 {
	; The test forces the stack to be realigned to a new boundary			; The test forces the stack to be realigned to a new boundary
	; since there is a local object with an alignment of 1024.			; since there is a local object with an alignment of 1024.
	; Should use BP to access the incoming stack arguments.			; Should use BP to access the incoming stack arguments.
	; The BP value is saved/restored with a VGPR spill.			; The BP value is saved/restored with a VGPR spill.

	; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:			; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:
	; GCN: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword [[VGPR_REG_1:v[0-9]+]], off, s[0:3], s32 offset:1032 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 [[VGPR_REG]], s33, 2			; GCN-NEXT: v_writelane_b32 [[VGPR_REG_1]], s33, 0
	; GCN-DAG: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0			; GCN-DAG: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0
	; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000			; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000
	; GCN: v_mov_b32_e32 v32, 0			; GCN: v_mov_b32_e32 v32, 0
	; GCN-DAG: v_writelane_b32 [[VGPR_REG]], s34, 3			; GCN-DAG: v_writelane_b32 [[VGPR_REG_1]], s34, 1
	; GCN: s_mov_b32 s34, s32			; GCN: s_mov_b32 s34, s32
	; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024			; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34			; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34
	; GCN-DAG: s_add_i32 s32, s32, 0x30000			; GCN-DAG: s_add_i32 s32, s32, 0x30000
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32
	; GCN: s_swappc_b64 s[30:31],			; GCN: s_swappc_b64 s[30:31],

	; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1			; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1
	; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0			; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0
	; GCN: s_add_i32 s32, s32, 0xfffd0000			; GCN: s_add_i32 s32, s32, 0xfffd0000
	; GCN-NEXT: v_readlane_b32 s33, [[VGPR_REG]], 2			; GCN-NEXT: v_readlane_b32 s33, [[VGPR_REG_1]], 0
	; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG]], 3			; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG_1]], 1
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword [[VGPR_REG_1]], off, s[0:3], s32 offset:1032 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN: s_setpc_b64 s[30:31]			; GCN: s_setpc_b64 s[30:31]
	%temp = alloca i32, align 1024, addrspace(5)			%temp = alloca i32, align 1024, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %temp, align 1024			store volatile i32 0, ptr addrspace(5) %temp, align 1024
	call void @extern_func(<32 x i32> %a, i32 %b)			call void @extern_func(<32 x i32> %a, i32 %b)
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

	Show All 14 Lines

	define amdgpu_gfx float @caller(float %arg0) {			define amdgpu_gfx float @caller(float %arg0) {
	; GCN-LABEL: caller:			; GCN-LABEL: caller:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1			; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[34:35]			; GCN-NEXT: s_mov_b64 exec, s[34:35]
	; GCN-NEXT: v_writelane_b32 v1, s33, 3
	; GCN-NEXT: v_writelane_b32 v1, s4, 0			; GCN-NEXT: v_writelane_b32 v1, s4, 0
				; GCN-NEXT: s_mov_b32 s36, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v1, s30, 1			; GCN-NEXT: v_writelane_b32 v1, s30, 1
	; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0			; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
	; GCN-NEXT: s_mov_b32 s4, 2.0			; GCN-NEXT: s_mov_b32 s4, 2.0
	; GCN-NEXT: v_writelane_b32 v1, s31, 2			; GCN-NEXT: v_writelane_b32 v1, s31, 2
	; GCN-NEXT: s_getpc_b64 s[34:35]			; GCN-NEXT: s_getpc_b64 s[34:35]
	; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4			; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s35, s35, callee@rel32@hi+12			; GCN-NEXT: s_addc_u32 s35, s35, callee@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GCN-NEXT: v_readlane_b32 s31, v1, 2			; GCN-NEXT: v_readlane_b32 s31, v1, 2
	; GCN-NEXT: v_readlane_b32 s30, v1, 1			; GCN-NEXT: v_readlane_b32 s30, v1, 1
	; GCN-NEXT: v_readlane_b32 s4, v1, 0			; GCN-NEXT: v_readlane_b32 s4, v1, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v1, 3			; GCN-NEXT: s_mov_b32 s33, s36
	; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1			; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[34:35]			; GCN-NEXT: s_mov_b64 exec, s[34:35]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%add = fadd float %arg0, 1.0			%add = fadd float %arg0, 1.0
	%call = tail call amdgpu_gfx float @callee(float %add, float inreg 2.0)			%call = tail call amdgpu_gfx float @callee(float %add, float inreg 2.0)
	ret float %call			ret float %call
	}			}

llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=1 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS1 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=1 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS1 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=0 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS0 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=0 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS0 %s

	declare void @wobble()			declare void @wobble()

	define internal fastcc void @widget() {			define internal fastcc void @widget() {
	; GFX90A-LABEL: widget:			; GFX90A-LABEL: widget:
	; GFX90A: ; %bb.0: ; %bb			; GFX90A: ; %bb.0: ; %bb
	; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX90A-NEXT: s_or_saveexec_b64 s[16:17], -1			; GFX90A-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GFX90A-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX90A-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX90A-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX90A-NEXT: s_mov_b64 exec, s[16:17]			; GFX90A-NEXT: s_mov_b64 exec, s[16:17]
	; GFX90A-NEXT: v_writelane_b32 v40, s33, 2			; GFX90A-NEXT: v_writelane_b32 v41, s33, 0
	; GFX90A-NEXT: s_mov_b32 s33, s32			; GFX90A-NEXT: s_mov_b32 s33, s32
	; GFX90A-NEXT: s_addk_i32 s32, 0x400			; GFX90A-NEXT: s_addk_i32 s32, 0x400
	; GFX90A-NEXT: s_getpc_b64 s[16:17]			; GFX90A-NEXT: s_getpc_b64 s[16:17]
	; GFX90A-NEXT: s_add_u32 s16, s16, wobble@gotpcrel32@lo+4			; GFX90A-NEXT: s_add_u32 s16, s16, wobble@gotpcrel32@lo+4
	; GFX90A-NEXT: s_addc_u32 s17, s17, wobble@gotpcrel32@hi+12			; GFX90A-NEXT: s_addc_u32 s17, s17, wobble@gotpcrel32@hi+12
	; GFX90A-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; GFX90A-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; GFX90A-NEXT: v_writelane_b32 v40, s30, 0			; GFX90A-NEXT: v_writelane_b32 v40, s30, 0
	; GFX90A-NEXT: v_writelane_b32 v40, s31, 1			; GFX90A-NEXT: v_writelane_b32 v40, s31, 1
	; GFX90A-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GFX90A-NEXT: s_swappc_b64 s[30:31], s[16:17]
	bb:			bb:
	tail call void @wobble()			tail call void @wobble()
	unreachable			unreachable
	}			}

	define amdgpu_kernel void @kernel(i32 addrspace(1)* %arg1.global, i1 %tmp3.i.i, i32 %tmp5.i.i, i32 %tmp427.i, i1 %tmp438.i, double %tmp27.i, i1 %tmp48.i) {			define amdgpu_kernel void @kernel(i32 addrspace(1)* %arg1.global, i1 %tmp3.i.i, i32 %tmp5.i.i, i32 %tmp427.i, i1 %tmp438.i, double %tmp27.i, i1 %tmp48.i) {
	; GLOBALNESS1-LABEL: kernel:			; GLOBALNESS1-LABEL: kernel:
	; GLOBALNESS1: ; %bb.0: ; %bb			; GLOBALNESS1: ; %bb.0: ; %bb
	; GLOBALNESS1-NEXT: s_mov_b64 s[54:55], s[6:7]			; GLOBALNESS1-NEXT: s_mov_b64 s[54:55], s[6:7]
	; GLOBALNESS1-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0			; GLOBALNESS1-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0
	; GLOBALNESS1-NEXT: s_load_dword s6, s[8:9], 0x14			; GLOBALNESS1-NEXT: s_load_dword s6, s[8:9], 0x14
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v42, v0			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v0
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v44, 0			; GLOBALNESS1-NEXT: v_mov_b32_e32 v40, 0
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], 0, 0
	; GLOBALNESS1-NEXT: global_store_dword v[0:1], v44, off			; GLOBALNESS1-NEXT: global_store_dword v[0:1], v40, off
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: global_load_dword v0, v44, s[36:37]			; GLOBALNESS1-NEXT: global_load_dword v0, v40, s[36:37]
	; GLOBALNESS1-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; GLOBALNESS1-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; GLOBALNESS1-NEXT: s_mov_b64 s[64:65], s[4:5]			; GLOBALNESS1-NEXT: s_mov_b64 s[64:65], s[4:5]
	; GLOBALNESS1-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18			; GLOBALNESS1-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18
	; GLOBALNESS1-NEXT: s_load_dword s7, s[8:9], 0x20			; GLOBALNESS1-NEXT: s_load_dword s7, s[8:9], 0x20
	; GLOBALNESS1-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; GLOBALNESS1-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; GLOBALNESS1-NEXT: s_add_u32 s0, s0, s17			; GLOBALNESS1-NEXT: s_add_u32 s0, s0, s17
	; GLOBALNESS1-NEXT: s_addc_u32 s1, s1, 0			; GLOBALNESS1-NEXT: s_addc_u32 s1, s1, 0
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v45, 0x40994400			; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, 0x40994400
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s38, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s38, 0
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[36:37], s[4:5], v[44:45]			; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[36:37], s[4:5], v[40:41]
	; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], 0			; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], 0
	; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS1-NEXT: s_xor_b64 s[94:95], s[4:5], -1			; GLOBALNESS1-NEXT: s_xor_b64 s[94:95], s[4:5], -1
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s6, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s6, 0
	; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]			; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]
	; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS1-NEXT: s_xor_b64 s[88:89], s[4:5], -1			; GLOBALNESS1-NEXT: s_xor_b64 s[88:89], s[4:5], -1
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s7, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s7, 0
	Show All 10 Lines
	; GLOBALNESS1-NEXT: s_mov_b64 s[34:35], s[10:11]			; GLOBALNESS1-NEXT: s_mov_b64 s[34:35], s[10:11]
	; GLOBALNESS1-NEXT: s_mov_b64 s[92:93], 0x80			; GLOBALNESS1-NEXT: s_mov_b64 s[92:93], 0x80
	; GLOBALNESS1-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1			; GLOBALNESS1-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1
	; GLOBALNESS1-NEXT: s_mov_b32 s69, 0x3ff00000			; GLOBALNESS1-NEXT: s_mov_b32 s69, 0x3ff00000
	; GLOBALNESS1-NEXT: s_mov_b32 s32, 0			; GLOBALNESS1-NEXT: s_mov_b32 s32, 0
	; GLOBALNESS1-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63			; GLOBALNESS1-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63
	; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0			; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v41, s4, 0			; GLOBALNESS1-NEXT: v_writelane_b32 v42, s4, 0
	; GLOBALNESS1-NEXT: v_writelane_b32 v41, s5, 1			; GLOBALNESS1-NEXT: v_writelane_b32 v42, s5, 1
	; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0			; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v41, s4, 2			; GLOBALNESS1-NEXT: v_writelane_b32 v42, s4, 2
	; GLOBALNESS1-NEXT: v_writelane_b32 v41, s5, 3			; GLOBALNESS1-NEXT: v_writelane_b32 v42, s5, 3
	; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0			; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v41, s4, 4			; GLOBALNESS1-NEXT: v_writelane_b32 v42, s4, 4
	; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0			; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v41, s5, 5			; GLOBALNESS1-NEXT: v_writelane_b32 v42, s5, 5
	; GLOBALNESS1-NEXT: s_branch .LBB1_4			; GLOBALNESS1-NEXT: s_branch .LBB1_4
	; GLOBALNESS1-NEXT: .LBB1_1: ; %bb70.i			; GLOBALNESS1-NEXT: .LBB1_1: ; %bb70.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_readlane_b32 s6, v41, 4			; GLOBALNESS1-NEXT: v_readlane_b32 s6, v42, 4
	; GLOBALNESS1-NEXT: v_readlane_b32 s7, v41, 5			; GLOBALNESS1-NEXT: v_readlane_b32 s7, v42, 5
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_29			; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_29
	; GLOBALNESS1-NEXT: .LBB1_2: ; %Flow6			; GLOBALNESS1-NEXT: .LBB1_2: ; %Flow6
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], 0
	; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS1-NEXT: .LBB1_3: ; %Flow19			; GLOBALNESS1-NEXT: .LBB1_3: ; %Flow19
	Show All 31 Lines
	; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a34, v2			; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a34, v2
	; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a33, v1			; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a33, v1
	; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a32, v0			; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a32, v0
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_30			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_30
	; GLOBALNESS1-NEXT: .LBB1_4: ; %bb5			; GLOBALNESS1-NEXT: .LBB1_4: ; %bb5
	; GLOBALNESS1-NEXT: ; =>This Loop Header: Depth=1			; GLOBALNESS1-NEXT: ; =>This Loop Header: Depth=1
	; GLOBALNESS1-NEXT: ; Child Loop BB1_15 Depth 2			; GLOBALNESS1-NEXT: ; Child Loop BB1_15 Depth 2
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]
	; GLOBALNESS1-NEXT: flat_load_dword v40, v[0:1]			; GLOBALNESS1-NEXT: flat_load_dword v44, v[0:1]
	; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40
	; GLOBALNESS1-NEXT: buffer_store_dword v44, off, s[0:3], 0			; GLOBALNESS1-NEXT: buffer_store_dword v40, off, s[0:3], 0
	; GLOBALNESS1-NEXT: flat_load_dword v43, v[0:1]			; GLOBALNESS1-NEXT: flat_load_dword v45, v[0:1]
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[42:43]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[42:43]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_8			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_8
	; GLOBALNESS1-NEXT: ; %bb.5: ; %NodeBlock			; GLOBALNESS1-NEXT: ; %bb.5: ; %NodeBlock
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_26			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_26
	; GLOBALNESS1-NEXT: ; %bb.10: ; %bb33.i			; GLOBALNESS1-NEXT: ; %bb.10: ; %bb33.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: global_load_dwordx2 v[0:1], v[32:33], off			; GLOBALNESS1-NEXT: global_load_dwordx2 v[0:1], v[32:33], off
	; GLOBALNESS1-NEXT: v_readlane_b32 s4, v41, 0			; GLOBALNESS1-NEXT: v_readlane_b32 s4, v42, 0
	; GLOBALNESS1-NEXT: v_readlane_b32 s5, v41, 1			; GLOBALNESS1-NEXT: v_readlane_b32 s5, v42, 1
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_12			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_12
	; GLOBALNESS1-NEXT: ; %bb.11: ; %bb39.i			; GLOBALNESS1-NEXT: ; %bb.11: ; %bb39.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[44:45], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[40:41], off
	; GLOBALNESS1-NEXT: .LBB1_12: ; %bb44.lr.ph.i			; GLOBALNESS1-NEXT: .LBB1_12: ; %bb44.lr.ph.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_cmp_ne_u32_e32 vcc, 0, v43			; GLOBALNESS1-NEXT: v_cmp_ne_u32_e32 vcc, 0, v45
	; GLOBALNESS1-NEXT: v_cndmask_b32_e32 v2, 0, v40, vcc			; GLOBALNESS1-NEXT: v_cndmask_b32_e32 v2, 0, v44, vcc
	; GLOBALNESS1-NEXT: s_mov_b64 s[72:73], s[42:43]			; GLOBALNESS1-NEXT: s_mov_b64 s[72:73], s[42:43]
	; GLOBALNESS1-NEXT: s_mov_b32 s75, s39			; GLOBALNESS1-NEXT: s_mov_b32 s75, s39
	; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_nlt_f64_e64 s[56:57], 0, v[0:1]			; GLOBALNESS1-NEXT: v_cmp_nlt_f64_e64 s[56:57], 0, v[0:1]
	; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2			; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2
	; GLOBALNESS1-NEXT: s_branch .LBB1_15			; GLOBALNESS1-NEXT: s_branch .LBB1_15
	; GLOBALNESS1-NEXT: .LBB1_13: ; %Flow7			; GLOBALNESS1-NEXT: .LBB1_13: ; %Flow7
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	Show All 32 Lines
	; GLOBALNESS1-NEXT: s_addc_u32 s61, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s61, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]			; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[46:47], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[44:45], 0, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]			; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[46:47], a[32:33], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[44:45], a[32:33], off
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_13			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_13
	; GLOBALNESS1-NEXT: ; %bb.22: ; %bb62.i			; GLOBALNESS1-NEXT: ; %bb.22: ; %bb62.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[46:47], v[44:45], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[44:45], v[40:41], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_13			; GLOBALNESS1-NEXT: s_branch .LBB1_13
	; GLOBALNESS1-NEXT: .LBB1_23: ; %LeafBlock			; GLOBALNESS1-NEXT: .LBB1_23: ; %LeafBlock
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_cmp_lg_u32 s39, 0			; GLOBALNESS1-NEXT: s_cmp_lg_u32 s39, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS1-NEXT: s_cselect_b64 s[6:7], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[6:7], -1, 0
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_9			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_9
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; GLOBALNESS1-NEXT: s_mov_b64 s[42:43], s[72:73]			; GLOBALNESS1-NEXT: s_mov_b64 s[42:43], s[72:73]
	; GLOBALNESS1-NEXT: .LBB1_26: ; %Flow15			; GLOBALNESS1-NEXT: .LBB1_26: ; %Flow15
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[70:71]			; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[70:71]
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_2			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_2
	; GLOBALNESS1-NEXT: ; %bb.27: ; %bb67.i			; GLOBALNESS1-NEXT: ; %bb.27: ; %bb67.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_readlane_b32 s6, v41, 2			; GLOBALNESS1-NEXT: v_readlane_b32 s6, v42, 2
	; GLOBALNESS1-NEXT: v_readlane_b32 s7, v41, 3			; GLOBALNESS1-NEXT: v_readlane_b32 s7, v42, 3
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_1			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_1
	; GLOBALNESS1-NEXT: ; %bb.28: ; %bb69.i			; GLOBALNESS1-NEXT: ; %bb.28: ; %bb69.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[44:45], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[40:41], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_1			; GLOBALNESS1-NEXT: s_branch .LBB1_1
	; GLOBALNESS1-NEXT: .LBB1_29: ; %bb73.i			; GLOBALNESS1-NEXT: .LBB1_29: ; %bb73.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[44:45], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[40:41], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_2			; GLOBALNESS1-NEXT: s_branch .LBB1_2
	; GLOBALNESS1-NEXT: .LBB1_30: ; %loop.exit.guard			; GLOBALNESS1-NEXT: .LBB1_30: ; %loop.exit.guard
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], -1			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], -1
	; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_32			; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_32
	; GLOBALNESS1-NEXT: ; %bb.31: ; %bb7.i.i			; GLOBALNESS1-NEXT: ; %bb.31: ; %bb7.i.i
	; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS1-NEXT: .LBB1_32: ; %Flow			; GLOBALNESS1-NEXT: .LBB1_32: ; %Flow
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_34			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_34
	; GLOBALNESS1-NEXT: ; %bb.33: ; %bb11.i.i			; GLOBALNESS1-NEXT: ; %bb.33: ; %bb11.i.i
	; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS1-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock			; GLOBALNESS1-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock
	;			;
	; GLOBALNESS0-LABEL: kernel:			; GLOBALNESS0-LABEL: kernel:
	; GLOBALNESS0: ; %bb.0: ; %bb			; GLOBALNESS0: ; %bb.0: ; %bb
	; GLOBALNESS0-NEXT: s_mov_b64 s[54:55], s[6:7]			; GLOBALNESS0-NEXT: s_mov_b64 s[54:55], s[6:7]
	; GLOBALNESS0-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0			; GLOBALNESS0-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0
	; GLOBALNESS0-NEXT: s_load_dword s6, s[8:9], 0x14			; GLOBALNESS0-NEXT: s_load_dword s6, s[8:9], 0x14
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v42, v0			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v0
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v44, 0			; GLOBALNESS0-NEXT: v_mov_b32_e32 v40, 0
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], 0, 0
	; GLOBALNESS0-NEXT: global_store_dword v[0:1], v44, off			; GLOBALNESS0-NEXT: global_store_dword v[0:1], v40, off
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: global_load_dword v0, v44, s[36:37]			; GLOBALNESS0-NEXT: global_load_dword v0, v40, s[36:37]
	; GLOBALNESS0-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; GLOBALNESS0-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; GLOBALNESS0-NEXT: s_mov_b64 s[62:63], s[4:5]			; GLOBALNESS0-NEXT: s_mov_b64 s[62:63], s[4:5]
	; GLOBALNESS0-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18			; GLOBALNESS0-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18
	; GLOBALNESS0-NEXT: s_load_dword s7, s[8:9], 0x20			; GLOBALNESS0-NEXT: s_load_dword s7, s[8:9], 0x20
	; GLOBALNESS0-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; GLOBALNESS0-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; GLOBALNESS0-NEXT: s_add_u32 s0, s0, s17			; GLOBALNESS0-NEXT: s_add_u32 s0, s0, s17
	; GLOBALNESS0-NEXT: s_addc_u32 s1, s1, 0			; GLOBALNESS0-NEXT: s_addc_u32 s1, s1, 0
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v45, 0x40994400			; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, 0x40994400
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s38, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s38, 0
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[36:37], s[4:5], v[44:45]			; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[36:37], s[4:5], v[40:41]
	; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], 0			; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], 0
	; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS0-NEXT: s_xor_b64 s[94:95], s[4:5], -1			; GLOBALNESS0-NEXT: s_xor_b64 s[94:95], s[4:5], -1
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s6, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s6, 0
	; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]			; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]
	; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS0-NEXT: s_xor_b64 s[88:89], s[4:5], -1			; GLOBALNESS0-NEXT: s_xor_b64 s[88:89], s[4:5], -1
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s7, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s7, 0
	Show All 10 Lines
	; GLOBALNESS0-NEXT: s_mov_b64 s[34:35], s[10:11]			; GLOBALNESS0-NEXT: s_mov_b64 s[34:35], s[10:11]
	; GLOBALNESS0-NEXT: s_mov_b64 s[92:93], 0x80			; GLOBALNESS0-NEXT: s_mov_b64 s[92:93], 0x80
	; GLOBALNESS0-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1			; GLOBALNESS0-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1
	; GLOBALNESS0-NEXT: s_mov_b32 s69, 0x3ff00000			; GLOBALNESS0-NEXT: s_mov_b32 s69, 0x3ff00000
	; GLOBALNESS0-NEXT: s_mov_b32 s32, 0			; GLOBALNESS0-NEXT: s_mov_b32 s32, 0
	; GLOBALNESS0-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63			; GLOBALNESS0-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63
	; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0			; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v41, s4, 0			; GLOBALNESS0-NEXT: v_writelane_b32 v42, s4, 0
	; GLOBALNESS0-NEXT: v_writelane_b32 v41, s5, 1			; GLOBALNESS0-NEXT: v_writelane_b32 v42, s5, 1
	; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0			; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v41, s4, 2			; GLOBALNESS0-NEXT: v_writelane_b32 v42, s4, 2
	; GLOBALNESS0-NEXT: v_writelane_b32 v41, s5, 3			; GLOBALNESS0-NEXT: v_writelane_b32 v42, s5, 3
	; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0			; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v41, s4, 4			; GLOBALNESS0-NEXT: v_writelane_b32 v42, s4, 4
	; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0			; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v41, s5, 5			; GLOBALNESS0-NEXT: v_writelane_b32 v42, s5, 5
	; GLOBALNESS0-NEXT: s_branch .LBB1_4			; GLOBALNESS0-NEXT: s_branch .LBB1_4
	; GLOBALNESS0-NEXT: .LBB1_1: ; %bb70.i			; GLOBALNESS0-NEXT: .LBB1_1: ; %bb70.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_readlane_b32 s6, v41, 4			; GLOBALNESS0-NEXT: v_readlane_b32 s6, v42, 4
	; GLOBALNESS0-NEXT: v_readlane_b32 s7, v41, 5			; GLOBALNESS0-NEXT: v_readlane_b32 s7, v42, 5
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_29			; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_29
	; GLOBALNESS0-NEXT: .LBB1_2: ; %Flow6			; GLOBALNESS0-NEXT: .LBB1_2: ; %Flow6
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], 0
	; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS0-NEXT: .LBB1_3: ; %Flow19			; GLOBALNESS0-NEXT: .LBB1_3: ; %Flow19
	Show All 31 Lines
	; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a34, v2			; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a34, v2
	; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a33, v1			; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a33, v1
	; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a32, v0			; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a32, v0
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_30			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_30
	; GLOBALNESS0-NEXT: .LBB1_4: ; %bb5			; GLOBALNESS0-NEXT: .LBB1_4: ; %bb5
	; GLOBALNESS0-NEXT: ; =>This Loop Header: Depth=1			; GLOBALNESS0-NEXT: ; =>This Loop Header: Depth=1
	; GLOBALNESS0-NEXT: ; Child Loop BB1_15 Depth 2			; GLOBALNESS0-NEXT: ; Child Loop BB1_15 Depth 2
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]
	; GLOBALNESS0-NEXT: flat_load_dword v40, v[0:1]			; GLOBALNESS0-NEXT: flat_load_dword v44, v[0:1]
	; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40
	; GLOBALNESS0-NEXT: buffer_store_dword v44, off, s[0:3], 0			; GLOBALNESS0-NEXT: buffer_store_dword v40, off, s[0:3], 0
	; GLOBALNESS0-NEXT: flat_load_dword v43, v[0:1]			; GLOBALNESS0-NEXT: flat_load_dword v45, v[0:1]
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[42:43]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[42:43]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_8			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_8
	; GLOBALNESS0-NEXT: ; %bb.5: ; %NodeBlock			; GLOBALNESS0-NEXT: ; %bb.5: ; %NodeBlock
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_26			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_26
	; GLOBALNESS0-NEXT: ; %bb.10: ; %bb33.i			; GLOBALNESS0-NEXT: ; %bb.10: ; %bb33.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: global_load_dwordx2 v[0:1], v[32:33], off			; GLOBALNESS0-NEXT: global_load_dwordx2 v[0:1], v[32:33], off
	; GLOBALNESS0-NEXT: v_readlane_b32 s4, v41, 0			; GLOBALNESS0-NEXT: v_readlane_b32 s4, v42, 0
	; GLOBALNESS0-NEXT: v_readlane_b32 s5, v41, 1			; GLOBALNESS0-NEXT: v_readlane_b32 s5, v42, 1
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_12			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_12
	; GLOBALNESS0-NEXT: ; %bb.11: ; %bb39.i			; GLOBALNESS0-NEXT: ; %bb.11: ; %bb39.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[44:45], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[40:41], off
	; GLOBALNESS0-NEXT: .LBB1_12: ; %bb44.lr.ph.i			; GLOBALNESS0-NEXT: .LBB1_12: ; %bb44.lr.ph.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_cmp_ne_u32_e32 vcc, 0, v43			; GLOBALNESS0-NEXT: v_cmp_ne_u32_e32 vcc, 0, v45
	; GLOBALNESS0-NEXT: v_cndmask_b32_e32 v2, 0, v40, vcc			; GLOBALNESS0-NEXT: v_cndmask_b32_e32 v2, 0, v44, vcc
	; GLOBALNESS0-NEXT: s_mov_b64 s[72:73], s[42:43]			; GLOBALNESS0-NEXT: s_mov_b64 s[72:73], s[42:43]
	; GLOBALNESS0-NEXT: s_mov_b32 s75, s39			; GLOBALNESS0-NEXT: s_mov_b32 s75, s39
	; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_nlt_f64_e64 s[56:57], 0, v[0:1]			; GLOBALNESS0-NEXT: v_cmp_nlt_f64_e64 s[56:57], 0, v[0:1]
	; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2			; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2
	; GLOBALNESS0-NEXT: s_branch .LBB1_15			; GLOBALNESS0-NEXT: s_branch .LBB1_15
	; GLOBALNESS0-NEXT: .LBB1_13: ; %Flow7			; GLOBALNESS0-NEXT: .LBB1_13: ; %Flow7
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	Show All 32 Lines
	; GLOBALNESS0-NEXT: s_addc_u32 s65, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s65, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]			; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[46:47], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[44:45], 0, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]			; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[46:47], a[32:33], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[44:45], a[32:33], off
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_13			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_13
	; GLOBALNESS0-NEXT: ; %bb.22: ; %bb62.i			; GLOBALNESS0-NEXT: ; %bb.22: ; %bb62.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[46:47], v[44:45], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[44:45], v[40:41], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_13			; GLOBALNESS0-NEXT: s_branch .LBB1_13
	; GLOBALNESS0-NEXT: .LBB1_23: ; %LeafBlock			; GLOBALNESS0-NEXT: .LBB1_23: ; %LeafBlock
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_cmp_lg_u32 s39, 0			; GLOBALNESS0-NEXT: s_cmp_lg_u32 s39, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS0-NEXT: s_cselect_b64 s[6:7], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[6:7], -1, 0
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_9			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_9
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; GLOBALNESS0-NEXT: s_mov_b64 s[42:43], s[72:73]			; GLOBALNESS0-NEXT: s_mov_b64 s[42:43], s[72:73]
	; GLOBALNESS0-NEXT: .LBB1_26: ; %Flow15			; GLOBALNESS0-NEXT: .LBB1_26: ; %Flow15
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[70:71]			; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[70:71]
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_2			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_2
	; GLOBALNESS0-NEXT: ; %bb.27: ; %bb67.i			; GLOBALNESS0-NEXT: ; %bb.27: ; %bb67.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_readlane_b32 s6, v41, 2			; GLOBALNESS0-NEXT: v_readlane_b32 s6, v42, 2
	; GLOBALNESS0-NEXT: v_readlane_b32 s7, v41, 3			; GLOBALNESS0-NEXT: v_readlane_b32 s7, v42, 3
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_1			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_1
	; GLOBALNESS0-NEXT: ; %bb.28: ; %bb69.i			; GLOBALNESS0-NEXT: ; %bb.28: ; %bb69.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[44:45], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[40:41], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_1			; GLOBALNESS0-NEXT: s_branch .LBB1_1
	; GLOBALNESS0-NEXT: .LBB1_29: ; %bb73.i			; GLOBALNESS0-NEXT: .LBB1_29: ; %bb73.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v45, v44			; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[44:45], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[40:41], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_2			; GLOBALNESS0-NEXT: s_branch .LBB1_2
	; GLOBALNESS0-NEXT: .LBB1_30: ; %loop.exit.guard			; GLOBALNESS0-NEXT: .LBB1_30: ; %loop.exit.guard
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], -1			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], -1
	; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_32			; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_32
	; GLOBALNESS0-NEXT: ; %bb.31: ; %bb7.i.i			; GLOBALNESS0-NEXT: ; %bb.31: ; %bb7.i.i
	; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS0-NEXT: .LBB1_32: ; %Flow			; GLOBALNESS0-NEXT: .LBB1_32: ; %Flow
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_34			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_34
	; GLOBALNESS0-NEXT: ; %bb.33: ; %bb11.i.i			; GLOBALNESS0-NEXT: ; %bb.33: ; %bb11.i.i
	; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s100
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v42			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43
	; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS0-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock			; GLOBALNESS0-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock
	bb:			bb:
	store i32 0, i32 addrspace(1)* null, align 4			store i32 0, i32 addrspace(1)* null, align 4
	%tmp4 = load i32, i32 addrspace(1)* %arg1.global, align 4			%tmp4 = load i32, i32 addrspace(1)* %arg1.global, align 4
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s
	; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s			; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s

	define hidden void @widget() {			define hidden void @widget() {
	; GCN-LABEL: widget:			; GCN-LABEL: widget:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 16			; GCN-NEXT: v_writelane_b32 v42, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 16			; GCN-NEXT: v_readlane_b32 s33, v42, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	; GCN-NEXT: .LBB0_9: ; %bb2			; GCN-NEXT: .LBB0_9: ; %bb2
	; GCN-NEXT: v_cmp_eq_u32_e64 s[46:47], 21, v0			; GCN-NEXT: v_cmp_eq_u32_e64 s[46:47], 21, v0
	; GCN-NEXT: v_cmp_ne_u32_e64 s[6:7], 21, v0			; GCN-NEXT: v_cmp_ne_u32_e64 s[6:7], 21, v0
	; GCN-NEXT: s_mov_b64 vcc, exec			; GCN-NEXT: s_mov_b64 vcc, exec
	; GCN-NEXT: s_cbranch_execnz .LBB0_2			; GCN-NEXT: s_cbranch_execnz .LBB0_2
	▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4			; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4
	; SI-OPT-NEXT: br label [[BB2]]			; SI-OPT-NEXT: br label [[BB2]]
	;			;
	; GCN-LABEL: blam:			; GCN-LABEL: blam:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 18			; GCN-NEXT: v_writelane_b32 v46, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s

	declare void @extern_func() #2			declare void @extern_func() #2

	define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be			; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be
	; preserved across the call and should get 8 scratch registers.			; preserved across the call and should get 8 scratch registers.
	; GFX9-LABEL: non_preserved_vgpr_tuple8:			; GFX9-LABEL: non_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v45, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v36, v16			; GFX9-NEXT: v_mov_b32_e32 v36, v16
	; GFX9-NEXT: v_mov_b32_e32 v35, v15			; GFX9-NEXT: v_mov_b32_e32 v35, v15
	; GFX9-NEXT: v_mov_b32_e32 v34, v14			; GFX9-NEXT: v_mov_b32_e32 v34, v14
	; GFX9-NEXT: v_mov_b32_e32 v33, v13			; GFX9-NEXT: v_mov_b32_e32 v33, v13
	; GFX9-NEXT: v_mov_b32_e32 v32, v12			; GFX9-NEXT: v_mov_b32_e32 v32, v12
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	Show All 23 Lines
	; GFX9-NEXT: v_mov_b32_e32 v3, v44			; GFX9-NEXT: v_mov_b32_e32 v3, v44
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v45, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: non_preserved_vgpr_tuple8:			; GFX10-LABEL: non_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v36, v16			; GFX10-NEXT: v_mov_b32_e32 v36, v16
	; GFX10-NEXT: v_mov_b32_e32 v35, v15			; GFX10-NEXT: v_mov_b32_e32 v35, v15
	; GFX10-NEXT: v_mov_b32_e32 v34, v14			; GFX10-NEXT: v_mov_b32_e32 v34, v14
	; GFX10-NEXT: v_mov_b32_e32 v33, v13			; GFX10-NEXT: v_mov_b32_e32 v33, v13
	; GFX10-NEXT: v_mov_b32_e32 v32, v12			; GFX10-NEXT: v_mov_b32_e32 v32, v12
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v45, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	Show All 20 Lines
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v45, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16
				; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: non_preserved_vgpr_tuple8:			; GFX11-LABEL: non_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:16 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:16
				; GFX11-NEXT: scratch_store_b32 off, v45, s32 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_dual_mov_b32 v36, v16 :: v_dual_mov_b32 v35, v15			; GFX11-NEXT: v_dual_mov_b32 v36, v16 :: v_dual_mov_b32 v35, v15
	; GFX11-NEXT: v_dual_mov_b32 v34, v14 :: v_dual_mov_b32 v33, v13			; GFX11-NEXT: v_dual_mov_b32 v34, v14 :: v_dual_mov_b32 v33, v13
	; GFX11-NEXT: v_mov_b32_e32 v32, v12			; GFX11-NEXT: v_mov_b32_e32 v32, v12
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v45, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8			; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v44, s33			; GFX11-NEXT: scratch_store_b32 off, v44, s33
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	Show All 18 Lines
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_load_b32 v44, off, s33			; GFX11-NEXT: scratch_load_b32 v44, off, s33
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v45, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:16 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:16
				; GFX11-NEXT: scratch_load_b32 v45, off, s32 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]





	Show All 17 Lines
	; across the call and should get allcoated to 8 CSRs.			; across the call and should get allcoated to 8 CSRs.
	; Only the lower 5 sub-registers of the tuple are preserved.			; Only the lower 5 sub-registers of the tuple are preserved.
	; The upper 3 sub-registers are unused.			; The upper 3 sub-registers are unused.
	; GFX9-LABEL: call_preserved_vgpr_tuple8:			; GFX9-LABEL: call_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v46, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_mov_b32_e32 v45, v16			; GFX9-NEXT: v_mov_b32_e32 v45, v16
	; GFX9-NEXT: v_mov_b32_e32 v44, v15			; GFX9-NEXT: v_mov_b32_e32 v44, v15
	Show All 17 Lines
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v46, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_preserved_vgpr_tuple8:			; GFX10-LABEL: call_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v46, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	Show All 18 Lines
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v46, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20
				; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_preserved_vgpr_tuple8:			; GFX11-LABEL: call_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:20 ; 4-byte Folded Spill			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_store_b32 off, v40, s32 offset:20
				; GFX11-NEXT: scratch_store_b32 off, v46, s32 offset:24
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: v_writelane_b32 v40, s33, 2			; GFX11-NEXT: v_writelane_b32 v46, s33, 0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_clause 0x4			; GFX11-NEXT: s_clause 0x4
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:16			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:16
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:12			; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:12
	; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:8			; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v44, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v44, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v45, s33			; GFX11-NEXT: scratch_store_b32 off, v45, s33
	; GFX11-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX11-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D
	Show All 16 Lines
	; GFX11-NEXT: scratch_load_b32 v45, off, s33			; GFX11-NEXT: scratch_load_b32 v45, off, s33
	; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:12
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:16			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:16
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: v_readlane_b32 s33, v40, 2			; GFX11-NEXT: v_readlane_b32 s33, v46, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:20 ; 4-byte Folded Reload			; GFX11-NEXT: s_clause 0x1
				; GFX11-NEXT: scratch_load_b32 v40, off, s32 offset:20
				; GFX11-NEXT: scratch_load_b32 v46, off, s32 offset:24
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]





	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 1,106 Lines • ▼ Show 20 Lines

	; GCN-LABEL: {{^}}callee_no_stack_with_call:			; GCN-LABEL: {{^}}callee_no_stack_with_call:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_waitcnt_vscnt			; GCN-NEXT: s_waitcnt_vscnt

	; GFX1064-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]+]], -1{{$}}			; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]+]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt_depctr 0xffe3			; GCN-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]

	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GFX1064: s_addk_i32 s32, 0x400			; GFX1064: s_addk_i32 s32, 0x400
	; GFX1032: s_addk_i32 s32, 0x200			; GFX1032: s_addk_i32 s32, 0x200


	; GCN-DAG: v_writelane_b32 v40, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v40, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-DAG: v_readlane_b32 s30, v40, 0			; GCN-DAG: v_readlane_b32 s30, v40, 0
	; GCN-DAG: v_readlane_b32 s31, v40, 1			; GCN-DAG: v_readlane_b32 s31, v40, 1


	; GFX1064: s_addk_i32 s32, 0xfc00			; GFX1064: s_addk_i32 s32, 0xfc00
	; GFX1032: s_addk_i32 s32, 0xfe00			; GFX1032: s_addk_i32 s32, 0xfe00
	; GCN: v_readlane_b32 s33, v40, 2			; GCN: v_readlane_b32 s33, v41, 0
	; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}			; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: s_clause 0x1
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GCN-NEXT: s_waitcnt_depctr 0xffe3			; GCN-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_no_stack_with_call() #1 {			define void @callee_no_stack_with_call() #1 {
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	Show All 39 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

	Show First 20 Lines • Show All 331 Lines • ▼ Show 20 Lines
	; GFX9-O0-LABEL: strict_wwm_call:			; GFX9-O0-LABEL: strict_wwm_call:
	; GFX9-O0: ; %bb.0:			; GFX9-O0: ; %bb.0:
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: v_writelane_b32 v3, s33, 2			; GFX9-O0-NEXT: s_mov_b32 s35, s33
	; GFX9-O0-NEXT: s_mov_b32 s33, s32			; GFX9-O0-NEXT: s_mov_b32 s33, s32
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x400			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x400
	; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0			; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0
	; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1			; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1
	; GFX9-O0-NEXT: s_mov_b32 s36, s4			; GFX9-O0-NEXT: s_mov_b32 s36, s4
	; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39			; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
	; GFX9-O0-NEXT: s_mov_b32 s37, s5			; GFX9-O0-NEXT: s_mov_b32 s37, s5
	; GFX9-O0-NEXT: s_mov_b32 s38, s6			; GFX9-O0-NEXT: s_mov_b32 s38, s6
	Show All 17 Lines
	; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0			; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0
	; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2			; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2
	; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]			; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1			; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4			; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4
	; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; GFX9-O0-NEXT: v_readlane_b32 s33, v3, 2			; GFX9-O0-NEXT: s_mov_b32 s33, s35
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: s_setpc_b64 s[30:31]			; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-O3-LABEL: strict_wwm_call:			; GFX9-O3-LABEL: strict_wwm_call:
	; GFX9-O3: ; %bb.0:			; GFX9-O3: ; %bb.0:
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: v_writelane_b32 v3, s33, 2
	; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0			; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0
				; GFX9-O3-NEXT: s_mov_b32 s38, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s32			; GFX9-O3-NEXT: s_mov_b32 s33, s32
	; GFX9-O3-NEXT: s_addk_i32 s32, 0x400			; GFX9-O3-NEXT: s_addk_i32 s32, 0x400
	; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1			; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8			; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0			; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O3-NEXT: s_getpc_b64 s[36:37]			; GFX9-O3-NEXT: s_getpc_b64 s[36:37]
	; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called@rel32@lo+4			; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called@rel32@lo+4
	; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called@rel32@hi+12			; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called@rel32@hi+12
	; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX9-O3-NEXT: v_mov_b32_e32 v1, v0			; GFX9-O3-NEXT: v_mov_b32_e32 v1, v0
	; GFX9-O3-NEXT: v_add_u32_e32 v1, v1, v2			; GFX9-O3-NEXT: v_add_u32_e32 v1, v1, v2
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v1			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-O3-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-O3-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-O3-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-O3-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-O3-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-O3-NEXT: v_readlane_b32 s33, v3, 2			; GFX9-O3-NEXT: s_mov_b32 s33, s38
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]			; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
	%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)			%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)
	▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: v_writelane_b32 v10, s33, 8			; GFX9-O0-NEXT: s_mov_b32 s42, s33
	; GFX9-O0-NEXT: s_mov_b32 s33, s32			; GFX9-O0-NEXT: s_mov_b32 s33, s32
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xc00			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xc00
	; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0			; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0
	; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1			; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1
	; GFX9-O0-NEXT: s_mov_b32 s34, s8			; GFX9-O0-NEXT: s_mov_b32 s34, s8
	; GFX9-O0-NEXT: s_mov_b32 s36, s4			; GFX9-O0-NEXT: s_mov_b32 s36, s4
	; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39			; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
	; GFX9-O0-NEXT: s_mov_b32 s37, s5			; GFX9-O0-NEXT: s_mov_b32 s37, s5
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3			; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
	; GFX9-O0-NEXT: s_mov_b32 s34, 0			; GFX9-O0-NEXT: s_mov_b32 s34, 0
	; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4			; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4
	; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1			; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1
	; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0			; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff400			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff400
	; GFX9-O0-NEXT: v_readlane_b32 s33, v10, 8			; GFX9-O0-NEXT: s_mov_b32 s33, s42
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	Show All 23 Lines
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: v_writelane_b32 v8, s33, 2
	; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0			; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0
				; GFX9-O3-NEXT: s_mov_b32 s40, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s32			; GFX9-O3-NEXT: s_mov_b32 s33, s32
	; GFX9-O3-NEXT: s_addk_i32 s32, 0x800			; GFX9-O3-NEXT: s_addk_i32 s32, 0x800
	; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1			; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: s_getpc_b64 s[36:37]			; GFX9-O3-NEXT: s_getpc_b64 s[36:37]
	; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4			; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4
	; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12			; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12
	; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0			; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0
	Show All 15 Lines
	; GFX9-O3-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v7, vcc			; GFX9-O3-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v7, vcc
	; GFX9-O3-NEXT: s_mov_b64 exec, s[38:39]			; GFX9-O3-NEXT: s_mov_b64 exec, s[38:39]
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O3-NEXT: v_mov_b32_e32 v1, v3			; GFX9-O3-NEXT: v_mov_b32_e32 v1, v3
	; GFX9-O3-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: v_readlane_b32 s31, v8, 1			; GFX9-O3-NEXT: v_readlane_b32 s31, v8, 1
	; GFX9-O3-NEXT: v_readlane_b32 s30, v8, 0			; GFX9-O3-NEXT: v_readlane_b32 s30, v8, 0
	; GFX9-O3-NEXT: s_addk_i32 s32, 0xf800			; GFX9-O3-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-O3-NEXT: v_readlane_b32 s33, v8, 2			; GFX9-O3-NEXT: s_mov_b32 s33, s40
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_load_dword v8, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v8, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	▲ Show 20 Lines • Show All 715 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEIClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 482885

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

llvm/test/CodeGen/AMDGPU/bf16.ll

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/nested-calls.ll

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

llvm/test/CodeGen/AMDGPU/save-fp.ll

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

llvm/test/CodeGen/AMDGPU/stack-realign.ll

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
ClosedPublic